Warning, /pim/kmime/README.md is written in an unsupported language. File is not indexed.

0001 # KMime #
0003 [TOC]
0005 # Introduction # {#introduction}
0007 KMime is a library for handling mail messages and newsgroup articles. Both mail messages and
0008 newsgroup articles are based on the same standard called MIME, which stands for
0009 **Multipurpose Internet Mail Extensions**. In this document, the term *message* is used to
0010 refer to both mail messages and newsgroup articles.
0012 KMime deals solely with the in-memory representation of messages. Topics such as transport or storage
0013 of messages are handled by other libraries, for example by [the mailtransport library](https://api.kde.org/kdepim/kmailtransport/html/index.html)
0014 or by [the KIMAP library](https://api.kde.org/kdepim/kimap/html/index.html).
0015 Similarly, this library does not deal with displaying messages or advanced composing, for those there
0016 are the messageviewer and messagecomposer
0017 components in the KDE PIM [messagelib](https://api.kde.org/kdepim/messagelib/html/index.html) module.
0019 KMime's main function is to parse, modify and assemble messages in-memory. In a
0020 [later section](@ref string-broken-down), *parsing* and *assembling* are actually explained.
0021 KMime provides high-level classes that make these tasks easy.
0023 MIME is defined by various RFCs, see the [RFC section](@ref rfcs) for a list of them.
0025 # Structure of this document # {#structure}
0027 This document will first give an [introduction to the MIME specification](@ref mime-intro), as it is
0028 essential to understand the basics of the structure of MIME messages for using this library.
0029 The introduction here is aimed at users of the library. It gives a broad overview with examples and
0030 omits some details. Developers who wish to modify KMime should read the
0031 [corresponding RFCs](@ref rfcs) as well, but this is not necessary for library users.
0033 After the introduction to the MIME format, the two ways of representing a message in memory are
0034 discussed, the [string representation and the broken down representation](@ref string-broken-down).
0036 This is followed by a section giving an 
0037 [overview of the most important KMime classes](@ref classes-overview).
0039 The last sections give a list of [relevant RFCs](@ref rfcs) and provide links for
0040 [further reading](@ref links).
0042 # Structure of MIME messages # {#mime-intro}
0044 ## A brief history of the MIME standard ## {#history}
0046 The MIME standard is quite new (1993), email and usenet existed way before the MIME standard came into
0047 existence. Because of this, the MIME standard has to keep backwards compatibility. The email
0048 standard before MIME lacked many capabilities, like encodings other than ASCII, or attachments. These
0049 and other things were later added by MIME. The standard for messages before MIME is defined in
0050 [RFC 5233](https://tools.ietf.org/html/rfc5322). In [RFC 2045](https://tools.ietf.org/html/rfc2045)
0051 to [RFC 2049](https://tools.ietf.org/html/rfc2049), several backward-compatible extensions
0052 to the basic message format are defined, adding support for attachments, different encodings and many
0053 others.
0055 Actually, there is an even older standard, defined in [RFC 733](https://tools.ietf.org/html/rfc733)
0056 (*Standard for the format of ARPA network text messages*, introduced in 1977).
0057 This standard is now obsoleted by RFC 5322, but backwards compatibility is in some cases supported, as
0058 there are still messages in this format around.
0060 Since pre-MIME messages had no way to handle attachments, attachments were sometimes added to the message
0061 text in an [uuencoded](https://en.wikipedia.org/wiki/Uuencoding) form. Although this is also
0062 obsolete, reading uuencoded attachments is still supported by KMime.
0064 After MIME was introduced, people realized that there was no way to have the filename of attachments
0065 encoded in anything other than ASCII. Thus, [RFC 2231](https://tools.ietf.org/html/rfc2231)
0066 was introduced to allow arbitrary encodings for parameter values, such as the attachment filename.
0068 ## MIME by examples ## {#examples}
0070 In the following sections, MIME message examples are shown, examined and explained, starting with
0071 a simple message and proceeding to more interesting examples.
0072 You can get additional examples by simply viewing the source of your own messages in your mail client,
0073 or by having a look at the examples in the [various RFCs](@ref rfcs).
0075 ### A simple message ### {#simple-email}
0077     Subject: First Mail
0078     From: John Doe <john.doe@domain.com>
0079     Date: Sun, 21 Feb 2010 19:16:11 +0100
0080     MIME-Version: 1.0
0082     Hello World!
0084 The above example features a very simple message. The two main parts of this message are the **header**
0085 and the **body**, which are separated by an empty line. The body contains the actual message content,
0086 and the header contains metadata about the message itself. The header consists of several **header fields**,
0087 each of them in their own line. Header fields are made up from the **header field name**, followed by a colon, followed
0088 by the **header field body**.
0090 The **MIME-Version** header field is mandatory for MIME messages. **Subject**,
0091 **From** and **Date** are important header fields; they are usually displayed in the message list of a
0092 mail client. The `Subject` header field can be anything, it does not have a special structure. It is a
0093 so-called **unstructured** header field. In contrast, the `From` and the `Date` header fields have
0094 to follow a special structure, they must be formed in a way that machines can parse. They are **structured**
0095 header fields. For example, a mail client needs to understand
0096 the `Date` header field so that it can sort the messages by date in the message list.
0097 The exact details of how the header field bodies of structured header fields should be
0098 formed are specified in an RFC.
0100 In this example, the `From` header contains a single email address. More precisely, a single email address is called
0101 a **mailbox**, which is made up of the **display name** (John Doe) and the **address specification** (john.doe@domain.com),
0102 which is enclosed in angle brackets. The `addr-spec` consists of the user name, the **local part**,
0103 and the **domain** name.
0105 Many header fields can contain multiple email addresses, for example the `To` field for messages with
0106 multiple recipients can have a comma-separated list of mailboxes.
0107 A list of mailboxes, together with a display name for the list, forms a **group**, and multiple groups can form an
0108 **address list**. This is however rarely used, you'll most often see a simple list of plain mailboxes.
0110 There are many more possible header fields than shown in this example, and the header can even contain
0111 arbitrary header fields, which usually are prefixed with `X-`, like `X-Face`.
0113 ### Encodings and charsets ### {#encodings}
0115     From: John Doe <john.doe@domain.com>
0116     Date: Mon, 22 Feb 2010 00:42:45 +0100
0117     MIME-Version: 1.0
0118     Content-Type: Text/Plain;
0119       charset="iso-8859-1"
0120     Content-Transfer-Encoding: quoted-printable
0122     Gr=FCezi Welt!
0124 The above shows a message that is using a different **charset** than the standard **US-ASCII** charset. The
0125 message body contains the string "Grüezi Welt!", which is **encoded** in a special way.
0127 The **content-type** of this message is **text/plain**, which means that the message is simple text. Later,
0128 other content types will be introduced, such as **text/html**. If there is no `Content-Type` header
0129 field, it is assumed that the content-type is `text/plain`.
0131 Before MIME was introduced, all messages were limited to the US-ASCII charset. Only the
0132 lower 127 values of the bytes were allowed to be used, the so-called **7-bit** range. Writing a message in
0133 another charset or using letters from the upper 127 byte values was not allowed.
0135 #### Charset Encoding ####
0137 When talking about charsets, it is important to understand how strings of text are converted to
0138 byte arrays, and the other way around. A message is nothing else than a big array of bytes.
0139 The bytes that form the body of the message somehow need to be interpreted as a text string. Interpreting
0140 a byte array as a text string is called **decoding** the text. Converting a text string to a byte array is called
0141 **encoding** the text. A **codec** (**co**der-**dec**oder) is a utility that can encode and decode text.
0142 In Qt, the class for text strings is QString, and the class for byte arrays is QByteArray. The base class
0143 of all codecs is QTextCodec.
0145 With the US-ASCII charset, encoding and decoding text is easy, one just has to look at an [ASCII table](https://en.wikipedia.org/wiki/ASCII_table)
0146 to be able to convert text strings to byte arrays and byte arrays to text strings. For
0147 example, the letter 'A' is represented by a single byte with the value of 65. When encountering a byte
0148 with the value 84, we can look that up in the table and see that it represents the letter 'T'.
0149 With the US-ASCII charset, each letter is represented by exactly one byte, which is very convenient.
0150 Even better, all letters commonly used in English text have byte values below 127, so the 7-bit limit
0151 of messages is no problem for text encoded with the US-ASCII charset.
0152 Another example: The string "Hello World!" is represented by the following byte array:
0154     48 65 6C 6C 6F 20 57 6F 72 6C 64
0156 Note that the byte values are written in hexadecimal form here, not in decimal as earlier.
0158 Now, what if we want to write a message that contains German umlauts or Chinese letters? Those
0159 are not in the ASCII table, therefore a different charset has to be used. There is a wealth of charsets
0160 to choose from. Not all charsets can handle all letters, for example the
0161 [ISO-8859-1](https://en.wikipedia.org/wiki/ISO-8859-1#ISO-8859-1) charset can handle
0162 German umlauts, but cannot handle Chinese or Arabic letters. The [Unicode standard](https://en.wikipedia.org/wiki/Unicode)
0163 is an attempt to introduce charsets that can handle all known letters in the
0164 world, in all languages. Unicode actually has several charsets, for example [UTF-8](https://en.wikipedia.org/wiki/UTF-8)
0165 and [UTF-16](https://en.wikipedia.org/wiki/UTF-16). In an ideal world, everyone would be using
0166 Unicode charsets, but for historic and legacy reasons, other charsets are still much in use.
0168 Charsets other than US-ASCII don't generally have as nice properties: A single letter can be represented
0169 by multiple bytes, and generally the byte values are not in the 7-bit range. Pay attention to the UTF-8
0170 charset: At first glance, it looks exactly like the US-ASCII charset, common latin letters like A - Z
0171 are encoded with the same byte values as with US-ASCII. However, letters other than A - Z are suddenly
0172 encoded with two or even more bytes. In general, one letter can be encoded in an abitrary number of bytes, depending
0173 on the charset. One can **not** rely on the `1 letter == 1 byte` assumption.
0175 Now, what should be done when the text string "Grüezi Welt!" should be sent in the body of a message?
0176 The first step is to choose a charset that can represent all of its letters. This already excludes US-ASCII.
0177 Once a charset is chosen, the text string is encoded into a byte array.
0178 "Grüezi Welt!" encoded with the ISO-8859-1 charset produces the following byte array:
0180     47 72 FC 65 7A 69 20 57 65 6C 74 21
0182 The letter 'ü' here is encoded using a single byte with the value `FC`.
0183 The same string encoded with UTF-8 looks slightly different:
0185     47 72 C3 BC 65 7A 69 20 57 65 6C 74 21
0187 Here, the letter 'ü' is encoded with two bytes, `C3 BC`. Still, one can see the similarity
0188 between the two charsets for the other letters.
0190 You can try this out yourself: Open your favorite text editor and enter some text with non-latin
0191 letters. Then save the file and view it in a hex editor to see how the text was converted to a
0192 byte array. Make sure to try out setting different charsets in your text editor.
0194 At this point, the text string is successfully converted to a byte array, using e.g. the ISO-8859-1
0195 charset. To indicate which charset was used, a **Content-Type** header field has to be added, with the correct
0196 **charset** parameter. In our example above, that was done. If the charset parameter of the `Content-Type`,
0197 or even the complete `Content-Type` header field is left out, the receiver can not know how to interpret
0198 the byte array! In these cases, the byte array is usually decoded incorrectly, and the text strings contain
0199 wrong letters or lots of question marks. There is even a special term for such wrongly decoded text,
0200 [Mojibake](https://en.wikipedia.org/wiki/Mojibake). It is important to always know what charset
0201 your byte array is encoded with, otherwise an attempt at decoding the byte array into a text string will fail and produce
0202 Mojibake. **There is no such thing as plain text!** If there is no `Content-Type` header field in
0203 a message, the message body should be interpreted as US-ASCII.
0205 To learn more about charsets and encodings, read 
0206 [The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)](https://www.joelonsoftware.com/articles/Unicode.html)
0207 and [A tutorial on character code issues](https://www.cs.tut.fi/~jkorpela/chars.html). Especially
0208 the first article should really be read, as the name indicates.
0210 #### Content Transfer Encoding ####
0212 Now, we can't use the byte array that was just created in a message. The string encoded with ISO-8859-1
0213 has the byte value `FC` for the letter 'ü', which is decimal value 252. However, as said earlier,
0214 messages are only valid when all bytes are in the 7-bit range, i.e. have byte value below 127.
0215 So what should we do for byte values that are greater than 127, how can they be added to messages? The solution
0216 for this is to use a **content transfer encoding** (CTE). A content transfer encoding takes a byte
0217 array as input and transforms it. The output is another byte array, but one which only uses byte values
0218 in the 7-bit range. One such content transfer encoding is **quoted-printable** (QTP), which is used in the
0219 above example. Quoted-printable is easy to understand: When encountering a byte that has a value greater
0220 than 127, it is simply replaced by a '=', followed by the hexadecimal code of the byte value, represented
0221 as letters and digits encoded with ASCII. This means
0222 that a byte with the value 252 is replaced with the ASCII string `=FC`, since `FC`
0223 is the hexadecimal value of 252. The ASCII string `=FC` itself is now three bytes big,
0224 `3D 46 43`. Therefore, the quoted-printable encoding replaces each byte outside of the 7-bit
0225 range with 3 new bytes. Decoding quoted-printable encoding is also easy: Each time a byte with the value
0226 `3D`, which is the letter '=' in ASCII, is encountered, the next two following bytes are interpreted
0227 as the hex value of the resulting byte. The quoted-printable encoding was invented to make reading the
0228 byte array easy for humans.
0230 The quoted-printable encoding is not a good choice when the input byte array contains lots of bytes
0231 outside the 7-bit range, as the resulting byte array will be three times as big in the worst case,
0232 which is a waste of space. Therefore another content transfer encoding was introduced, **Base64**.
0233 The details of the base64 encoding are too much to write about here; refer to the
0234 [Wikipedia article](https://en.wikipedia.org/wiki/Base64) or the [RFC](https://tools.ietf.org/html/rfc2045#section-6.8)
0235 for details. As an example, the ISO-8859-1 encoded text string "Grüezi Welt!" is, after encoding it with base64,
0236 represented by the following ASCII string: `R3L8ZXppIFdlbHQh`.
0237 To express the same in byte arrays: The byte array `47 72 FC 65 7A 69 20 57 65 6C 74 21`
0238 is, after encoding it with base64,
0239 represented by the byte array `52 33 4C 38 5A 58 70 70 49 46 64 6C 62 48 51 68`.
0241 There are two other content transfer encodings besides quoted printable and base64: **7-bit** and
0242 **8-bit**. 7-bit is just a marker to indicate that no content transfer encoding is used. This is the
0243 case when the byte array is already completely in the 7-bit range, for example when writing English
0244 text using the US-ASCII charset. 8-bit is also a marker to indicate that no content transfer encoding
0245 was used. This time, not because it was not necessary, but because of a special exception, byte values
0246 outside of the 7-bit range are allowed. For example, some SMTP servers support the
0247 [8BITMIME](https://tools.ietf.org/html/rfc1652) extension, which indicates that they accept
0248 bytes outside of the 7-bit range. In this case, one can simply use the byte arrays as-is, without using
0249 any content transfer encoding. Creating messages with 8-bit content transfer encoding is currently not
0250 supported by KMime. The advantage of 8-bit is that there is no overhead in size, unlike with
0251 base64 or even quoted-printable.
0253 When using one of the 4 contents transfer encodings, i.e. quoted-printable, base64, 7-bit or 8-bit, this
0254 has to be indicated in the header field **Content-Transfer-Encoding**. If the header field is left out,
0255 it is assumed that the content transfer encoding is 7-bit. The example above uses quoted-printable.
0257     From: John Doe <john.doe@domain.com>
0258     Date: Mon, 22 Feb 2010 00:42:45 +0100
0259     MIME-Version: 1.0
0260     Content-Type: Text/Plain;
0261       charset="iso-8859-1"
0262     Content-Transfer-Encoding: base64
0264     R3L8ZXppIFdlbHQh
0266 The same example, this time encoded with the base64 content transfer encoding.
0268     From: John Doe <john.doe@domain.com>
0269     Date: Mon, 22 Feb 2010 00:42:45 +0100
0270     MIME-Version: 1.0
0271     Content-Type: Text/Plain;
0272       charset="utf-8"
0273     Content-Transfer-Encoding: base64
0275     R3LDvGV6aSBXZWx0IQ==
0277 Again the same example, this time using UTF-8 as the charset.
0279     From: John Doe <john.doe@domain.com>
0280     Date: Mon, 22 Feb 2010 00:42:45 +0100
0281     MIME-Version: 1.0
0282     Content-Type: Text/Plain;
0283       charset="utf-8"
0284     Content-Transfer-Encoding: quoted-printable
0286     Gr=C3=BCezi Welt!
0288 The example with a combination of UTF-8 and quoted-printable CTE. As said somewhere above, with the
0289 UTF-8 encoding, the letter 'ü' is represented by the two bytes `C3 BC`.
0291     From: John Doe <john.doe@domain.com>
0292     Date: Mon, 22 Feb 2010 00:42:45 +0100
0293     MIME-Version: 1.0
0294     Content-Type: Text/Plain;
0295       charset="utf-8"
0296     Content-Transfer-Encoding: 7-bit
0298     Hello World
0300 A different example, showing 7-bit content transfer encoding. Although the UTF-8 charset has lots
0301 of letters that are represented by bytes outside of the 7-bit range, the string "Hello World" can
0302 be fully represented in the 7-bit range here, even with UTF-8.
0304 In the [further reading](@ref links) section, you will find links to web applications that demonstrate
0305 encodings and charsets.
0307 #### Conclusion ####
0309 When adding a text string to the body of a message, it needs to be encoded twice: First, the encoding of the charset
0310 needs to be applied, which transforms the text string into a byte array. Afterwards, the content transfer
0311 encoding has to be applied, which transforms the byte array from the first step into a byte array that
0312 only has bytes in the 7-bit range.
0314 When decoding, the same has to be done, in reverse: One first has decode the byte array with the content transfer encoding, to get a byte
0315 array that has all 256 possible byte values. Afterwards, the resulting byte array needs to be decoded
0316 with the correct charset, to transform it into a text string. For those two decoding steps, one has to
0317 look at the `Content-Type` and the `Content-Transfer-Encoding` header fields to find the correct
0318 charset and CTE for decoding.
0320 It is important to always keep the charset and the content transfer encoding in mind. Byte arrays and
0321 strings are not to be confused. Byte arrays that are encoded with a CTE are not to be confused with
0322 byte arrays that are **not** encoded with a CTE.
0324 This section showed how to use different charsets in the *body* of a message. The next section will
0325 show what to do when another charset is needed in one of the *header* field bodies.
0327 ### Encoding in Header Fields ### {#header-encoding}
0329 In the last section, we discussed how to use different charsets in the body of a message. But what if
0330 a different charset needs to be added to one of the header fields? For example one might want to write
0331 a mail to a mailbox with the display name "András Manţia" and with the subject "Grüezi!".
0333 The header fields are limited to characters in the 7-bit range, and are interpreted as US-ASCII.
0334 That means the header field names, such as "From: ", are all encoded in US-ASCII. The header field
0335 bodies, such as the "1.0" of `MIME-Version`, are also encoded with US-ASCII. This is mandated by
0336 [the RFC](https://tools.ietf.org/html/rfc5322#section-2).
0338 The `Content-Type` and the `Content-Transfer-Encoding` header fields only apply to the message body,
0339 they have no meaning for other header fields.
0341 This means that any letter in a different charset has to be encoded in some way to satisfy the RFC.
0342 Letters with a different charset are only allowed in some of the header field bodies; the header field
0343 names always have to be in US-ASCII.
0345     From: Thomas McGuire <thomas@domain.com>
0346     Subject: =?iso-8859-1?q?Gr=FCezi!?=
0347     Date: Mon, 22 Feb 2010 14:34:01 +0100
0348     MIME-Version: 1.0
0349     To: =?utf-8?q?Andr=C3=A1s?= =?utf-8?q?_Man=C5=A3ia?= <andras@domain.com>
0350     Content-Type: Text/Plain;
0351       charset="us-ascii"
0352     Content-Transfer-Encoding: 7bit
0354     bla bla bla
0356 The above example shows how text that is encoded with a different charset than US-ASCII is handled
0357 in the message header. This can be seen in the bodies of the `Subject` header field and the `To` header field.
0358 In this example, the body of the message is unimportant, it is just "bla bla bla" in US-ASCII.
0359 The way the header field bodies are encoded is sometimes referred to as a **RFC2047 string** or as an **encoded word**, which has
0360 its origin in the [RFC](https://tools.ietf.org/html/rfc2047) where this encoding scheme is defined.
0361 RFC2047 strings are only allowed in some of the header fields, like `Subject`, and in the display name
0362 of mailboxes in header fields like `From` and `To`. In other header fields, such as `Date` and
0363 `MIME-Version`, they are not allowed, but they wouldn't make much sense there anyway, since those are
0364 structured header fields with a clearly defined structure.
0366 RFC2047 strings start with "=?" and end with "?=". Between those markers, they consists of three parts:
0367 * The charset, such as "iso-8859-1"
0368 * The encoding, which is "q" or "b"
0369 * The encoded text
0371 These three parts are separated with a '?'. Encoding the third part, the text, is very similar to how
0372 text strings in the message body are encoded: First, the text string is encoded to a byte array using
0373 the charset encoding. Afterwards, the second encoding is used on the result, to ensure that all resulting
0374 bytes are within the 7-bit range.
0376 The *second encoding* here is almost identical to the content transfer encoding. There are two
0377 possible encodings, **b** and **q**. The `b` encoding is the same as the base64 encoding of the content
0378 transfer encoding. The `q` encoding is very similar to the quoted-printable encoding of the content
0379 transfer encoding, but with some little differences that are described in
0380 [the RFC](https://tools.ietf.org/html/rfc2047#section-4.2).
0382 Let's examine the subject of the message, `=?iso-8859-1?q?Gr=FCezi!?=`, in detail:
0384 The first part of the RFC2027 string is the charset, so it is ISO-8859-1 in this case. The second part
0385 is the encoding, which is the `q` encoding here. The last part is the encoded text, which is
0386 `Gr=FCezi!`. As with the quoted-printable encoding, "=FC" is the encoding for the byte with
0387 the value `FC`, which in the ISO-8859-1 charset is the letter 'ü'. The complete decoded
0388 text is therefore "Grüezi!".
0390 Each RFC2047 string in the header can use a different charset: In this example, the `Subject` uses ISO-8859-1,
0391 `To` uses UTF-8 and the message body uses US-ASCII.
0393 In the `To` header field, two RFC2047 strings are used. A single, bigger, RFC2047 string for the whole
0394 display name could also have been used. In this case, the second RFC2047 string starts with an underscore,
0395 which is decoded as a space in the `q` encoding. The space between the two RFC2047 strings is ignored,
0396 it is just used to separate the two encoded words.
0398 There are some restriction on RFC2047 strings: They are not allowed to be longer than 75 characters,
0399 which means two or more encoded words have to be used for long text strings. Also, there are some
0400 restrictions on where RFC2047 strings are allowed; most importantly, the address specification must
0401 not be encoded, to be backwards compatible. For further details, refer to the RFC.
0403 ### Messages with attachments ### {#multipart-mixed}
0405 Until now, we only looked at messages that had a single text part as the message body. In this section,
0406 we'll examine messages with attachments.
0408     From: frank@domain.com
0409     To: greg@domain.com
0410     Subject: Nice Photo
0411     Date: Sun, 28 Feb 2010 19:57:00 +0100
0412     MIME-Version: 1.0
0413     Content-Type: Multipart/Mixed;
0414       boundary="Boundary-00=_8xriL5W6LSj00Ly"
0416     --Boundary-00=_8xriL5W6LSj00Ly
0417     Content-Type: Text/Plain;
0418       charset="us-ascii"
0419     Content-Transfer-Encoding: 7bit
0421     Hi Greg,
0423     attached you'll find a nice photo.
0425     --Boundary-00=_8xriL5W6LSj00Ly
0426     Content-Type: image/jpeg;
0427       name="test.jpeg"
0428     Content-Transfer-Encoding: base64
0429     Content-Disposition: attachment;
0430       filename="test.jpeg"
0433     [SNIP 800 lines]
0434     ze5CdSH2Z8yTatHSV2veW0rKzeq30//Z
0436     --Boundary-00=_8xriL5W6LSj00Ly--
0438 *Note: Since the image in this message would be really big, most of it is omitted / snipped here.*
0440 The above example consists of two parts: A normal text part and an image attachment. Messages that
0441 consist of multiple parts are called **multipart** messages. The top-level content-type therefore is
0442 **multipart/mixed**. `Mixed` simply means that the following parts have no relation to each other,
0443 it is just a random mixture of parts. Later, we will look at other types, such as `multipart/alternative`
0444 or `multipart/related`. A **part** is sometimes also called **node**, **content** or **MIME part**.
0446 Each MIME part of the message is separated by a **boundary**, and that boundary
0447 is specified in the top-level content-type header as a parameter. In the message body, the boundary
0448 is prefixed with `"--"`, and the last boundary is suffixed with `"--"`, so that the end of the message can
0449 be detected. When creating a message, care must be taken that the boundary appears nowhere else in the
0450 message, for example in the text part, as the parser would get confused by this.
0452 A MIME part begins right after the boundary. It consists of a **MIME header** and a **MIME body**, which
0453 are separated by an empty line. The MIME header should not be confused with the message header: The
0454 message header contains metadata about the whole message, like subject and date. The MIME header only
0455 contains metadata about the specific MIME part, like the content type of the MIME part. MIME header
0456 field names always start with `"Content-"`.
0457 The example above shows the three most important MIME header fields. Usually those are the only ones
0458 used. The top-level header of a message actually mixes the message metadata and the MIME metadata into one header: In this
0459 example, the header contains the `Date` header field, which is an ordinary header field, and it contains
0460 the `Content-Type` header field, which is a MIME header field.
0462 MIME parts can be nested, and therefore form a tree. The above example has the following tree:
0464     multipart/mixed
0465     |- text/plain
0466     \- image/jpeg
0468 The `text/plain` node is therefore a `child` of the `multipart/mixed` node. The `multipart/mixed` node
0469 is a `parent` of the other two nodes. The `image/jpeg` node is a **sibling** of the `text/plain` node.
0470 `Multipart` nodes are the only nodes that have children, other nodes are **leaf** nodes.
0471 The body of a multipart node consists of all complete child nodes (MIME header and MIME body), separated
0472 by the boundary.
0474 Each MIME part can have a different content transfer encoding. In the above example, the text part has
0475 a `7bit` CTE, while the image part has a `base64` CTE. The multipart/mixed node does not specify
0476 a CTE, multipart nodes always have `7bit` as the CTE. This is because the body of multipart nodes can
0477 only consist of bytes in the 7 bit range: The boundary is 7 bit, the MIME headers are 7 bit, and the
0478 MIME bodies are already encoded with the CTE of the child MIME part, and are therefore also 7 bit. This means
0479 no CTE for multipart nodes is necessary.
0481 The MIME part for the image does not specify a charset parameter in the content type header field. This
0482 is because the body of that MIME part will not be interpreted as a text string, therefore the byte array
0483 does not need to be decoded to a string. Instead, the byte array is interpreted as an image, by an image
0484 renderer. The message viewer application passes the MIME part body as a byte array to the image renderer.
0485 The content type consists of a **media type** and a **subtype**. For example, the content type
0486 `"text/html"` has the media type "text" and the subtype "html". Only nodes that have the media type "text"
0487 need to specify a charset, as those nodes are the only nodes of which the body is interpreted as a text string.
0489 The only header field not yet encountered in previous sections is the **Content-Disposition** header field,
0490 which is defined in a [separate RFC](https://tools.ietf.org/html/rfc2183). It describes how
0491 the message viewer application should display the MIME part. In the case of the image part, is should
0492 be presented as an attachment. The **filename** parameter tells the message viewer application which filename
0493 should be used by default when the user saves the attachment to disk.
0495 The content type header field for the image MIME part has a **name** parameter, which is similar to the
0496 `filename` parameter of the `Content-Disposition` header field. The difference is that `name` refers
0497 to the name of the complete MIME part, whereas `filename` refers to the name of the attachment. The
0498 `name` parameter of the `Content-Type` header field in this case is superfluous and only exists for
0499 backwards compatibility, and can be ignored;
0500 the `filename` parameter of the `Content-Disposition` header field should be preferred when it is present.
0502     From: Thomas McGuire <thomas@domain.com>
0503     To: sebastian@domain.com
0504     Subject: Help with SPARQL
0505     Date: Sun, 28 Feb 2010 21:57:51 +0100
0506     MIME-Version: 1.0
0507     Content-Type: Multipart/Mixed;
0508       boundary="Boundary-00=_PjtiLU2PvHpvp/R"
0510     --Boundary-00=_PjtiLU2PvHpvp/R
0511     Content-Type: Text/Plain;
0512       charset="us-ascii"
0513     Content-Transfer-Encoding: 7bit
0515     Hi Sebastian,
0517     I have a problem with a SPARQL query, can you help me debug this? Attached is
0518     the query and a screenshot showing the result.
0520     --Boundary-00=_PjtiLU2PvHpvp/R
0521     Content-Type: text/plain;
0522       charset="UTF-8";
0523       name="query.txt"
0524     Content-Transfer-Encoding: 7bit
0525     Content-Disposition: attachment;
0526       filename="query.txt"
0528     prefix nco:<http://www.semanticdesktop.org/ontologies/2007/03/22/nco#>
0530     SELECT ?person
0531     WHERE {
0532      ?person a nco:PersonContact .
0533      ?person nco:birthDate ?birthDate .
0534     }"
0535     --Boundary-00=_PjtiLU2PvHpvp/R
0536     Content-Type: image/png;
0537       name="screenshot.png"
0538     Content-Transfer-Encoding: base64
0539     Content-Disposition: attachment;
0540       filename="screenshot.png"
0543     [SNIP]
0544     YXJlLmpwZWcAZGlnaUthbS0w
0546     --Boundary-00=_PjtiLU2PvHpvp/R--
0548 The above example message consists of three MIME parts: The main text part and two attachments.
0549 One attachment has the media type `text`, therefore a charset parameter is necessary to correctly
0550 display it. The MIME tree looks like this:
0552     multipart/mixed
0553     |- text/plain
0554     |- text/plain
0555     \- image/jpeg
0557 ### HTML Messages ### {#multipart-alternative}
0559     From: Thomas McGuire <thomas@domain.com>
0560     Subject: HTML test
0561     Date: Thu, 4 Mar 2010 13:59:18 +0100
0562     MIME-Version: 1.0
0563     Content-Type: multipart/alternative;
0564       boundary="Boundary-01=_m66jLd2/vZrH5oe"
0565     Content-Transfer-Encoding: 7bit
0567     --Boundary-01=_m66jLd2/vZrH5oe
0568     Content-Type: text/plain;
0569       charset="us-ascii"
0570     Content-Transfer-Encoding: 7bit
0572     Hello World
0574     --Boundary-01=_m66jLd2/vZrH5oe
0575     Content-Type: text/html;
0576       charset="us-ascii"
0577     Content-Transfer-Encoding: 7bit
0579     <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0//EN" "http://www.w3.org/TR/REC-html40/strict.dtd">
0580     <html>
0581       <head></head>
0582       <body>
0583         Hello <b>World</b>
0584       </body>
0585     </html>
0586     --Boundary-01=_m66jLd2/vZrH5oe--
0588 The above example is a simple HTML message. It consists of a plain text and a HTML part, which are
0589 in a **multipart/alternative** container. The message has the following structure:
0591     multipart/alternative
0592     |- text/plain
0593     \- text/html
0595 The HTML part and the plain text part have the identical content, except that the HTML part contains
0596 additional markup, in this case for displaying the word `World` in bold. Since those parts are in a
0597 multipart/alternative container, the message viewer application can freely choose which part it displays.
0598 Some users might prefer reading the message in HTML format, some might prefer reading the message
0599 in plain text format.
0601 Of course, a HTML message could also consist only of a single `text/html`, without the multipart/alternative
0602 container and therefore without an alternative plain text part. However, people preferring the plain
0603 text version wouldn't like this, especially if their mail client has no HTML engine and they would see
0604 the HTML source including all tags only. Therefore, HTML messages should always include an alternative plain text part.
0606 HTML messages can of course also contain attachments. In this case, the message contains both a
0607 multipart/alternative and a multipart/mixed node, for example with the following structure, for a HTML
0608 message that has an image attachment:
0610     multipart/mixed
0611     |- multipart/alternative
0612     |  |- text/plain
0613     |  \- text/html
0614     \- image/png
0616 The message itself would look like this:
0618     From: Thomas McGuire <thomas@domain.com>
0619     Subject: HTML message with an attachment
0620     Date: Thu, 4 Mar 2010 15:20:26 +0100
0621     MIME-Version: 1.0
0622     Content-Type: Multipart/Mixed;
0623       boundary="Boundary-00=_qG8jLwWCwkUfJV1"
0625     --Boundary-00=_qG8jLwWCwkUfJV1
0626     Content-Type: multipart/alternative;
0627       boundary="Boundary-01=_qG8jLfs1FRmlOhl"
0628     Content-Transfer-Encoding: 7bit
0630     --Boundary-01=_qG8jLfs1FRmlOhl
0631     Content-Type: text/plain;
0632       charset="us-ascii"
0633     Content-Transfer-Encoding: 7bit
0635     Hello World
0637     --Boundary-01=_qG8jLfs1FRmlOhl
0638     Content-Type: text/html;
0639       charset="us-ascii"
0640     Content-Transfer-Encoding: 7bit
0642     <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0//EN" "http://www.w3.org/TR/REC-html40/strict.dtd">
0643     <html>
0644       <head></head>
0645       <body>
0646         Hello <b>World</b>
0647       </body>
0648     </html>
0649     --Boundary-01=_qG8jLfs1FRmlOhl--
0651     --Boundary-00=_qG8jLwWCwkUfJV1
0652     Content-Type: image/png;
0653       name="test.png"
0654     Content-Transfer-Encoding: base64
0655     Content-Disposition: attachment;
0656       filename="test.png"
0659     [SNIP]
0660     eFkXsFgBMG4fJhYlx+iyB3cLpNZwYr/iP7teTwNYa7DZAAAAAElFTkSuQmCC
0662     --Boundary-00=_qG8jLwWCwkUfJV1--
0664 ### HTML Messages with Inline Images ### {#multipart-related}
0666 HTML has support for showing images, with the `img` tag. Such an image is shown at the place where
0667 the `img` tag occurs, which is called an **inline image**. Note that inline images are different
0668 from images that are just normal attachments: Normal attachments are always shown at the beginning or
0669 at the end of the message, while inline images are shown in-place. In HTML, the `img` tag points to an
0670 image file that is either a file on disk or a URL of an image on the Internet. To make inline images
0671 work with MIME messages, a different mechanism is needed, since the image is not a file on disk or on
0672 the Internet, but a MIME part somewhere in the same message. As specified in
0673 [RFC 2557](https://tools.ietf.org/html/rfc2557), the way this can be done is by referring
0674 to a **Content-ID** in the `img` tag, and marking the MIME part that is the image with that content
0675 ID as well.
0677 An example will probably be more clear than this explanation:
0679     From: Thomas McGuire <thomas@domain.com>
0680     Subject: Inine Image Test
0681     Date: Thu, 4 Mar 2010 16:54:53 +0100
0682     MIME-Version: 1.0
0683     Content-Type: multipart/related;
0684       boundary="Boundary-02=_Nf9jLpJ2aGp5RQK"
0685     Content-Transfer-Encoding: 7bit
0687     --Boundary-02=_Nf9jLpJ2aGp5RQK
0688     Content-Type: multipart/alternative;
0689       boundary="Boundary-01=_Nf9jLZ6aPhm3WrN"
0690     Content-Transfer-Encoding: 7bit
0691     Content-Disposition: inline
0693     --Boundary-01=_Nf9jLZ6aPhm3WrN
0694     Content-Type: text/plain;
0695       charset="us-ascii"
0696     Content-Transfer-Encoding: 7bit
0698     Text before image
0700     Text after image
0702     --Boundary-01=_Nf9jLZ6aPhm3WrN
0703     Content-Type: text/html;
0704       charset="us-ascii"
0705     Content-Transfer-Encoding: 7bit
0707     <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0//EN" "http://www.w3.org/TR/REC-html40/strict.dtd">
0708     <html>
0709       <head></head>
0710       <body>
0711         Text before image<br>
0712         <img src="cid:547730348@KDE" /><br>
0713         Text after image
0714       </body>
0715     </html>
0716     --Boundary-01=_Nf9jLZ6aPhm3WrN--
0718     --Boundary-02=_Nf9jLpJ2aGp5RQK
0719     Content-Type: image/png;
0720       name="test.png"
0721     Content-Transfer-Encoding: base64
0722     Content-Id: <547730348@KDE>
0725     [SNIP]
0726     AABJRU5ErkJggg==
0727     --Boundary-02=_Nf9jLpJ2aGp5RQK--
0729 The first thing you'll notice in this example probably is that it has a **multipart/related** node with
0730 the following structure:
0732     multipart/related
0733     |- multipart/alternative
0734     |  |- text/plain
0735     |  \- text/html
0736     \- image/png
0738 When the HTML part has inline image, the HTML part and its image part both have to be children of a
0739 multipart/related container, like in this example.
0740 In this case, the `img` tag has the source `cid:547730348@KDE`, which is a placeholder that refers
0741 to the Content-Id header of another part. The image part contains exactly that value in its `Content-Id`
0742 header, and therefore a message viewer application can connect both.
0744 The plain text part cannot have inline images, therefore its text might seem a bit confusing.
0746 HTML messages with inline images can of course also have attachments, in which the message structure
0747 becomes a mix of multipart/related, multipart/alternative and multipart/mixed. The following example
0748 shows the structure of a message with two inline images and one `.tar.gz` attachment:
0750     multipart/mixed
0751     |- multipart/related
0752     |  |- multipart/alternative
0753     |  |  |- text/plain
0754     |  |  \- text/html
0755     |  |- image/png
0756     |  \- image/png
0757     \- application/x-compressed-tar
0759 The structure of MIME messages can get arbitrarily complex, the above is just one relatively simple example.
0760 The nesting of multipart nodes can get much deeper, there is no restriction on nesting levels.
0762 ### Encapsulated messages ### {#encapsulated}
0764 Encapsulated messages are messages which are attachments to another message. The most common example
0765 is a forwarded mail, like in this example:
0767     From: Frank <frank@domain.com>
0768     To: Bob <bob@domain.com>
0769     Subject: Fwd: Blub
0770     MIME-Version: 1.0
0771     Content-Type: Multipart/Mixed;
0772       boundary="Boundary-00=_sX+jLVPkV1bLFdZ"
0774     --Boundary-00=_sX+jLVPkV1bLFdZ
0775     Content-Type: text/plain;
0776       charset="us-ascii"
0777     Content-Transfer-Encoding: 7bit
0779     Hi Bob,
0781     hereby I forward you an interesting message from Greg.
0783     --Boundary-00=_sX+jLVPkV1bLFdZ
0784     Content-Type: message/rfc822;
0785       name="forwarded message"
0786     Content-Transfer-Encoding: 7bit
0787     Content-Description: Forwarded Message
0788     Content-Disposition: inline
0790     From: Greg <greg@domain.com>
0791     To: Frank <frank@domain.com>
0792     Subject: Blub
0793     MIME-Version: 1.0
0794     Content-Type: Text/Plain;
0795       charset="us-ascii"
0796     Content-Transfer-Encoding: 7bit
0798     Bla Bla Bla
0800     --Boundary-00=_sX+jLVPkV1bLFdZ--
0803     multipart/mixed
0804     |- text/plain
0805     \- message/rfc822
0806     \- text/plain
0808 The attached message is treated like any other attachment, and therefore the top-level content type
0809 is multipart/mixed.
0810 The most interesting part is the `message/rfc822` MIME part. As usual, it has some MIME headers, like
0811 `Content-Type` or `Content-Disposition`, followed by the MIME body. The MIME body in this case is
0812 the attached message. Since it is a message, it consists of a header and a body itself.
0813 Therefore, the `message/rfc822` MIME part appears to have two headers; in reality, it is the normal
0814 MIME header and the message header of the encapsulated message. The message header and the message body
0815 are both in the MIME body of the `message/rfc822` MIME part.
0817 ### Signed and Encrypted Messages ### {#crypto}
0819 MIME messages can be cryptographically signed and/or encrypted. The format for those messages is
0820 defined in [RFC 1847](https://tools.ietf.org/html/rfc1847), which specifies two new
0821 multipart subtypes, **multipart/signed** and **multipart/encrypted**. The crypto format of these new
0822 security multiparts is defined in additional RFCs; the most common formats are
0823 [OpenPGP](https://tools.ietf.org/html/rfc3156) and [S/MIME](https://tools.ietf.org/html/rfc2633).
0824 Both formats use the principle of [public-key cryptography](https://en.wikipedia.org/wiki/Public-key_cryptography).
0825 OpenPGP uses **keys**, and S/MIME uses **certificates**. For easier text flow, only the term `key` will be used
0826 for both keys and certificates in the text below.
0828 Security multiparts only sign or encrypt a specific MIME part. The consequence is that the message headers
0829 can not be signed or encrypted. Also this means that it is possible to sign or encrypt only some of
0830 the MIME parts of a message, while leaving other MIME parts unsigned or unencrypted. Furthermore, it
0831 is possible to sign or encrypt different MIME parts with different crypto formats. As you can see,
0832 security multiparts are very flexible.
0834 Security multiparts are not supported by KMime. However, it is possible for applications to use KMime
0835 when providing support for crypto messages. For example, the messageviewer
0836 component in KDE PIM's [messagelib](https://api.kde.org/kdepim/messagelib/html/index.html) supports signed and encrypted MIME parts, and the
0837 messagecomposer library can create
0838 such messages.
0840 Signed MIME parts are signed with the private key of the sender, and everybody who has the
0841 public key of the sender can verify the signature. Encrypted MIME parts are encrypted with the public
0842 key of the receiver, and only the receiver, who is the sole person possessing the private key, can decrypt
0843 it. Sending an encrypted message to multiple recipients therefore means that the message has to be sent
0844 multiple times, once for each receiver, as each message needs to be encrypted with a different key.
0846 #### Signed MIME parts ####
0848 A multipart/signed MIME part has exactly two children: The first child is the content that is signed,
0849 and the second child is the signature.
0851     From: Thomas McGuire <thomas@domain.com>
0852     Subject: My Subject
0853     Date: Mon, 15 Mar 2010 12:20:16 +0100
0854     MIME-Version: 1.0
0855     Content-Type: multipart/signed;
0856       boundary="nextPart2567247.O5e8xBmMpa";
0857       protocol="application/pgp-signature";
0858       micalg=pgp-sha1
0859     Content-Transfer-Encoding: 7bit
0861     --nextPart2567247.O5e8xBmMpa
0862     Content-Type: Text/Plain;
0863       charset="us-ascii"
0864     Content-Transfer-Encoding: 7bit
0866     Simple message
0868     --nextPart2567247.O5e8xBmMpa
0869     Content-Type: application/pgp-signature; name=signature.asc
0870     Content-Description: This is a digitally signed message part.
0872     -----BEGIN PGP SIGNATURE-----
0873     Version: GnuPG v2.0.14 (GNU/Linux)
0875     iEYEABECAAYFAkueF/UACgkQKglv3sO8a1MdTACgnBEP6ZUal931Vwu7PyiXT1bn
0876     Zr0Anj4bAI9JhHEDiwA/iwrWGfSC+Nlz
0877     =d2ol
0878     -----END PGP SIGNATURE-----
0879     --nextPart2567247.O5e8xBmMpa--
0882     multipart/signed
0883     |- text/plain
0884     \- application/pgp-signature
0886 The example here uses the OpenPGP format to sign a simply plain text message. Here, the text/plain
0887 MIME part is signed, and the application/pgp-signature MIME part contains the signature data, which in
0888 this case is ASCII-armored.
0890 As said above, it is possible to sign only some MIME parts. A message which has a image/jpeg attachment
0891 that is signed, but a main text part is not signed, has the following MIME structure:
0893     multipart/mixed
0894     |- text/plain
0895     \- multipart/signed
0896     |- image/jpeg
0897     \- application/pgp-signature
0899 It is possible to sign multipart parts as well. Consider the above example that has a plain text part
0900 and an image attachment. Those two parts can be signed together, with the following structure:
0902     multipart/signed
0903     |- multipart/mixed
0904     |  |- text/plain
0905     |  \- image/jpeg
0906     \- application/pgp-signature
0908 Signed messages in the S/MIME format use a different content type for the signature data, like here:
0910     multipart/signed
0911     |- text/plain
0912     \- application/x-pkcs7-signature
0914 #### Encrypted MIME parts ####
0916 Multipart/encrypted MIME parts also have exactly two children: The first child contains metadata about
0917 the encrypted data, such as a version number. The second child then contains the actual encrypted data.
0919     From: someone@domain.com
0920     To: Thomas McGuire <thomas@domain.com>
0921     Subject: Encrypted message
0922     Date: Mon, 15 Mar 2010 12:50:16 +0100
0923     MIME-Version: 1.0
0924     Content-Type: multipart/encrypted;
0925       boundary="nextPart2726747.j47xUGTWKg";
0926       protocol="application/pgp-encrypted"
0927     Content-Transfer-Encoding: 7bit
0929     --nextPart2726747.j47xUGTWKg
0930     Content-Type: application/pgp-encrypted
0931     Content-Disposition: attachment
0933     Version: 1
0934     --nextPart2726747.j47xUGTWKg
0935     Content-Type: application/octet-stream
0936     Content-Disposition: inline; filename="msg.asc"
0938     -----BEGIN PGP MESSAGE-----
0939     Version: GnuPG v2.0.14 (GNU/Linux)
0941     hQIOA8p5rdC5CBNfEAf+NZVzVq48C1r5opOOiWV96+FUzIWuMQ6u8fzFgI7YVyCn
0942     [SNIP]
0943     =reNr
0944     --nextPart2726747.j47xUGTWKg--
0945     -----END PGP MESSAGE-----
0948     multipart/encrypted
0949     |- application/pgp-encrypted
0950     \- application/octet-stream
0952 The encrypted data is contained in the `application/octet-stream` MIME part. Without decrypting
0953 the data, it is unknown what the original content type of the encrypted MIME data is! The encrypted
0954 data could be a simple text/plain MIME part, an image attachment, or a multipart part. The encrypted
0955 data contains both the MIME header and the MIME body of the original MIME part, as the header is needed
0956 to know the content type of the data. The data could as well be of content type multipart/signed, in
0957 which case the message would be both signed and encrypted.
0959 #### Inline Crypto Formats ####
0961 Although using the security multiparts `multipart/signed` and `multipart/encrypted` is the recommended
0962 standard, there are other possibilities to sign or encrypt a message. The most common methods are
0963 **Inline OpenPGP** and **S/MIME Opaque**.
0965 For inline OpenPGP messages, the crypto data is contained inlined in the actual MIME part. For example,
0966 a message with a signed text/plain part might look like this:
0968     From: someone@domain.com
0969     To: someoneelse@domain.com
0970     Subject: Inline OpenPGP test
0971     MIME-Version: 1.0
0972     Content-Type: text/plain;
0973       charset="us-ascii"
0974     Content-Transfer-Encoding: 7bit
0975     Content-Disposition: inline
0977     -----BEGIN PGP SIGNED MESSAGE-----
0978     Hash: SHA1
0980     Inline OpenPGP signed example.
0981     -----BEGIN PGP SIGNATURE-----
0982     Version: GnuPG v2.0.14 (GNU/Linux)
0984     iEYEARECAAYFAkueJ2EACgkQKglv3sO8a1MS3QCfcsYnJG7uYQxzxz6J5cPF7lHz
0985     WIoAn3PjVPlWibu02dfdFObwd2eJ1jAW
0986     =p3uO
0987     -----END PGP SIGNATURE-----
0989 Encrypted inline OpenPGP works in a similar way. Opaque S/MIME messages are also similar: For signed
0990 MIME parts, both the signature and the signed data are contained in a single MIME part with a content
0991 type of `application/pkcs7-mime`.
0993 As security multiparts are preferred over inline OpenPGP and over opaque S/MIME, I won't go into more
0994 detail here.
0996 ### Miscellaneous Points about Messages ### {#misc}
0998 #### Line Breaks ####
1000 Each line in a MIME message has to end with a **CRLF**, which is a carriage return followed by a
1001 newline, which is the escape sequence `\\r\\n`. CR and LF may not appear in other places in
1002 a MIME message. Special care needs to be taken with encoded line breaks in binary data, and with
1003 distinguishing soft and hard line breaks when converting between different content transfer encodings.
1004 For more details, have a look at the RFCs.
1006 While the official format is to have a CRLF at the end of each line, KMime only expects a single LF
1007 for its in-memory storage. Therefore, when loading a message from disk or from a server into KMime, the CRLFs need
1008 to be converted to LFs first, for example with KMime::CRLFtoLF(). The opposite needs to be done when
1009 storing a KMime message somewhere.
1011 Lines should not be longer than 78 characters and may not be longer than 998 characters.
1013 #### Header Folding and CFWS ####
1015 Header fields can span multiple lines, which was already shown in some of the examples above where
1016 the parameters of the header field value were in the next line. The header field is said to be
1017 **folded** in this case. In general, header fields can be folded whenever whitespace (**WS**) occurs.
1019 Header field values can contain **comments**; these comments are semantically invisible and have no
1020 meaning. Comments are surrounded by parentheses.
1022     Date: Thu, 13
1023           Feb 1969 23:32 -0330 (Newfoundland Time)
1025 This example shows a folded header that also has a comment (*Newfoundland Time*). The date header is a structured header
1026 field, and therefore it has to obey to a defined syntax; however, adding comments and whitespace is
1027 allowed almost anywhere, and they are ignored when parsing the message. Comments and whitespace where
1028 folding is allowed is sometimes referred to as **CFWS**. Any occurrence of CFWS is semantically regarded
1029 as a single space.
1031 # The two in-memory representations of messages # {#string-broken-down }
1033 There are two representations of messages in memory. The first is called **string representation**
1034 and the other one is called **broken-down representation**.
1036 String representation is somewhat misnamed,
1037 a better term would be "byte array representation". The string representation is just a big array of
1038 bytes in memory, and those bytes make up the encoded mail. The string representation is what is stored
1039 on disk or what is received from an IMAP server, for example.
1041 With the broken-down representation, the mail is *broken down* into smaller structures. For example,
1042 instead of having a single byte array for all headers, the broken-down structure has a list of individual headers,
1043 and each header in that list is again broken down into a structure. While the string representation
1044 is just an array of 7 bit characters that might be encoded, the broken-down representations contain the
1045 decoded text strings.
1047 As an example, consider the byte array
1049     "Hugo Maier" <hugo.maier@mailer.domain>
1051 Although this is just a bunch of 7 bit characters, a human immediately recognizes the broken-down structure and
1052 sees that the display name is "Hugo Maier" and that the localpart of the email address is "hugo.maier".
1053 To illustrate, the broken-down structure could be stored in a structure like this:
1055     struct Mailbox
1056     {
1057         QString displayName;
1058         QByteArray addressSpec;
1059     };
1061 The address spec actually could be broken down further into a localpart and a domain.
1062 The process of converting the string representation to a broken-down representation is called **parsing**, and
1063 the reverse is called **assembling**. Parsing a message is necessary when wanting to access or modify the broken-down
1064 structure. For example, when sending a mail,
1065 the address spec of a mailbox needs to be passed to the SMTP server, which means that the recipient headers need to
1066 be parsed in order to access that information. Another example is the message list in an mail application, where the
1067 broken-down structure of a mail is needed
1068 to display information like subject, sender and date in the list.
1069 On the other hand, assembling a message is for example done in the composer of a mail application, where the mail information
1070 is available in a broken-down form in the composer window, and is then assembled into a final MIME message that is then sent with SMTP.
1072 Parsing is often quite tricky. You should always use the methods from KMime instead of writing parsing
1073 routines yourself. Even the simple mailbox example above is in practice difficult to parse, as many things like comments
1074 and escaped characters need to be taken into consideration.
1075 The same is true for assembling: In the above case, one could be tempted to assemble the mailbox by simply
1076 writing code like this:
1078     QByteArray stringRepresentation = '"' + displayName + "\" <" + addressSpec + ">";
1080 However, just like with parsing, you shouldn't be doing assembling yourself. In the above case, for example,
1081 the display name might contain non-ASCII characters, and RFC2047 encoding would need to be applied. So use
1082 KMime for assembling in all cases.
1084 When parsing a message and assembling it afterwards, the result might not be the same as the original byte
1085 array. For example, comments in header fields are ignored during parsing and not stored in the broken-down
1086 structure, therefore the assembled message will also not contain comments.
1088 Messages in memory are usually stored in a broken-down structure so that it is easy to to access and
1089 manipulate the message. On disk and on servers, messages are stored in string representation.
1091 # Overview of KMime classes # {#classes-overview}
1093 KMime has basically two sets of classes: Classes for headers and classes for MIME
1094 parts. A MIME part is represented by `KMime::Content`. A Content can be parsed from a string representation
1095 and also be assembled from the broken-down representation again. If parsed, it has a list of sub-contents (in case of multipart contents) and a
1096 list of headers. If the Content is not parsed, it stores the headers and the body in a byte array, which can be accessed
1097 with head() and body().
1098 There is also a class `KMime::Message`, which basically is a thin wrapper around Content for the top-level
1099 MIME part. Message also contains convenience methods to access the message headers.
1101 For headers, there is a class hierarchy, with `KMime::Headers::Base` as the base class, and
1102 `KMime::Headers::Generics::Structured` and `KMime::Headers::Generics::Unstructured` in the next levels. Unstructured is
1103 for headers that don't have a defined structure, like Subject, whereas Structured headers have a
1104 specific structure, like Date. The header classes have methods to parse headers, like `from7BitString()`,
1105 and to assemble them, like `as7BitString()`. Once a header is parsed, the classes provide access to the
1106 broken-down structures; for example the `Date` header has a method `dateTime()`.
1107 The parsing in `from7BitString()` is usually handled by a protected `parse()` function, which in turn call
1108 parsing functions for different types, like `parseAddressList()` or `parseAddrSpec()` from the `KMime::HeaderParsing`
1109 namespace.
1111 When modifying messages, the message is first parsed into a broken-down representation. This broken-down
1112 representation can then be accessed and modified with the appropriate functions. After changing the broken-down
1113 structure, it needs to be assembled again to get the modified string representation.
1115 KMime also comes with some codes for handling base64 and quoted-printable encoding, with `KMime::Codec`
1116 as the base class.
1118 # RFCs # {#rfcs}
1120 * [RFC 5322](https://tools.ietf.org/html/rfc5322): Internet Message Format
1121 * [RFC 5536](https://tools.ietf.org/html/rfc5536): Netnews Article Format
1122 * [RFC 2045](https://tools.ietf.org/html/rfc2045): Multipurpose Internet Mail Extensions (MIME), Part 1: Format of Internet Message Bodies
1123 * [RFC 2046](https://tools.ietf.org/html/rfc2046): Multipurpose Internet Mail Extensions (MIME), Part 2: Media Types
1124 * [RFC 2047](https://tools.ietf.org/html/rfc2047): Multipurpose Internet Mail Extensions (MIME), Part 3: Message Header Extensions for Non-ASCII Text
1125 * [RFC 2048](https://tools.ietf.org/html/rfc2048): Multipurpose Internet Mail Extensions (MIME), Part 4: Registration Procedures
1126 * [RFC 2049](https://tools.ietf.org/html/rfc2049): Multipurpose Internet Mail Extensions (MIME), Part 5: Conformance Criteria and Examples
1127 * [RFC 2231](https://tools.ietf.org/html/rfc2231): MIME Parameter Value and Encoded Word Extensions: Character Sets, Languages, and Continuations
1128 * [RFC 2183](https://tools.ietf.org/html/rfc2183): Communicating Presentation Information in Internet Message: The Content-Disposition Header Field
1129 * [RFC 2557](https://tools.ietf.org/html/rfc2557): MIME Encapsulation of Aggregate Documents, such as HTML (MHTML)
1130 * [RFC 1847](https://tools.ietf.org/html/rfc1847): Security Multiparts for MIME: Multipart/Signed and Multipart/Encrypted
1131 * [RFC 3851](https://tools.ietf.org/html/rfc3851): S/MIME Version 3 Message Specification
1132 * [RFC 3156](https://tools.ietf.org/html/rfc3156): MIME Security with OpenPGP
1133 * [RFC 2298](https://tools.ietf.org/html/rfc2298): An Extensible Message Format for Message Disposition Notifications
1134 * [RFC 2646](https://tools.ietf.org/html/rfc2646): The Text/Plain Format Parameter (not supported by KMime)
1136 # Further Reading # {#section}
1138 * [Wikipedia article on MIME](https://en.wikipedia.org/wiki/MIME)
1139 * [The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)](https://www.joelonsoftware.com/articles/Unicode.html)
1140 * [A tutorial on character code issues](https://www.cs.tut.fi/~jkorpela/chars.html)
1141 * [Online Base64 encoder and decoder](https://www.motobit.com/util/base64-decoder-encoder.asp)
1142 * [Online quoted-printable encoder](https://www.motobit.com/util/quoted-printable-encoder.asp)
1143 * [Onlinw quota reached](https://www.motobit.com/util/quoted-printable-decoder.asp)
1144 * [Online charset converter](https://www.motobit.com/util/charset-codepage-conversion.asp)
1145 * [Wikipedia article on public-key cryptography](https://en.wikipedia.org/wiki/Public-key_cryptography)