Warning, /pim/kmime/README.md is written in an unsupported language. File is not indexed.

0001 # KMime #
0002 
0003 [TOC]
0004 
0005 # Introduction # {#introduction}
0006 
0007 KMime is a library for handling mail messages and newsgroup articles. Both mail messages and
0008 newsgroup articles are based on the same standard called MIME, which stands for
0009 **Multipurpose Internet Mail Extensions**. In this document, the term *message* is used to
0010 refer to both mail messages and newsgroup articles.
0011 
0012 KMime deals solely with the in-memory representation of messages. Topics such as transport or storage
0013 of messages are handled by other libraries, for example by [the mailtransport library](https://api.kde.org/kdepim/kmailtransport/html/index.html)
0014 or by [the KIMAP library](https://api.kde.org/kdepim/kimap/html/index.html).
0015 Similarly, this library does not deal with displaying messages or advanced composing, for those there
0016 are the messageviewer and messagecomposer
0017 components in the KDE PIM [messagelib](https://api.kde.org/kdepim/messagelib/html/index.html) module.
0018 
0019 KMime's main function is to parse, modify and assemble messages in-memory. In a
0020 [later section](@ref string-broken-down), *parsing* and *assembling* are actually explained.
0021 KMime provides high-level classes that make these tasks easy.
0022 
0023 MIME is defined by various RFCs, see the [RFC section](@ref rfcs) for a list of them.
0024 
0025 # Structure of this document # {#structure}
0026 
0027 This document will first give an [introduction to the MIME specification](@ref mime-intro), as it is
0028 essential to understand the basics of the structure of MIME messages for using this library.
0029 The introduction here is aimed at users of the library. It gives a broad overview with examples and
0030 omits some details. Developers who wish to modify KMime should read the
0031 [corresponding RFCs](@ref rfcs) as well, but this is not necessary for library users.
0032 
0033 After the introduction to the MIME format, the two ways of representing a message in memory are
0034 discussed, the [string representation and the broken down representation](@ref string-broken-down).
0035 
0036 This is followed by a section giving an 
0037 [overview of the most important KMime classes](@ref classes-overview).
0038 
0039 The last sections give a list of [relevant RFCs](@ref rfcs) and provide links for
0040 [further reading](@ref links).
0041 
0042 # Structure of MIME messages # {#mime-intro}
0043 
0044 ## A brief history of the MIME standard ## {#history}
0045 
0046 The MIME standard is quite new (1993), email and usenet existed way before the MIME standard came into
0047 existence. Because of this, the MIME standard has to keep backwards compatibility. The email
0048 standard before MIME lacked many capabilities, like encodings other than ASCII, or attachments. These
0049 and other things were later added by MIME. The standard for messages before MIME is defined in
0050 [RFC 5233](https://tools.ietf.org/html/rfc5322). In [RFC 2045](https://tools.ietf.org/html/rfc2045)
0051 to [RFC 2049](https://tools.ietf.org/html/rfc2049), several backward-compatible extensions
0052 to the basic message format are defined, adding support for attachments, different encodings and many
0053 others.
0054 
0055 Actually, there is an even older standard, defined in [RFC 733](https://tools.ietf.org/html/rfc733)
0056 (*Standard for the format of ARPA network text messages*, introduced in 1977).
0057 This standard is now obsoleted by RFC 5322, but backwards compatibility is in some cases supported, as
0058 there are still messages in this format around.
0059 
0060 Since pre-MIME messages had no way to handle attachments, attachments were sometimes added to the message
0061 text in an [uuencoded](https://en.wikipedia.org/wiki/Uuencoding) form. Although this is also
0062 obsolete, reading uuencoded attachments is still supported by KMime.
0063 
0064 After MIME was introduced, people realized that there was no way to have the filename of attachments
0065 encoded in anything other than ASCII. Thus, [RFC 2231](https://tools.ietf.org/html/rfc2231)
0066 was introduced to allow arbitrary encodings for parameter values, such as the attachment filename.
0067 
0068 ## MIME by examples ## {#examples}
0069 
0070 In the following sections, MIME message examples are shown, examined and explained, starting with
0071 a simple message and proceeding to more interesting examples.
0072 You can get additional examples by simply viewing the source of your own messages in your mail client,
0073 or by having a look at the examples in the [various RFCs](@ref rfcs).
0074 
0075 ### A simple message ### {#simple-email}
0076 
0077     Subject: First Mail
0078     From: John Doe <john.doe@domain.com>
0079     Date: Sun, 21 Feb 2010 19:16:11 +0100
0080     MIME-Version: 1.0
0081     
0082     Hello World!
0083 
0084 The above example features a very simple message. The two main parts of this message are the **header**
0085 and the **body**, which are separated by an empty line. The body contains the actual message content,
0086 and the header contains metadata about the message itself. The header consists of several **header fields**,
0087 each of them in their own line. Header fields are made up from the **header field name**, followed by a colon, followed
0088 by the **header field body**.
0089 
0090 The **MIME-Version** header field is mandatory for MIME messages. **Subject**,
0091 **From** and **Date** are important header fields; they are usually displayed in the message list of a
0092 mail client. The `Subject` header field can be anything, it does not have a special structure. It is a
0093 so-called **unstructured** header field. In contrast, the `From` and the `Date` header fields have
0094 to follow a special structure, they must be formed in a way that machines can parse. They are **structured**
0095 header fields. For example, a mail client needs to understand
0096 the `Date` header field so that it can sort the messages by date in the message list.
0097 The exact details of how the header field bodies of structured header fields should be
0098 formed are specified in an RFC.
0099 
0100 In this example, the `From` header contains a single email address. More precisely, a single email address is called
0101 a **mailbox**, which is made up of the **display name** (John Doe) and the **address specification** (john.doe@domain.com),
0102 which is enclosed in angle brackets. The `addr-spec` consists of the user name, the **local part**,
0103 and the **domain** name.
0104 
0105 Many header fields can contain multiple email addresses, for example the `To` field for messages with
0106 multiple recipients can have a comma-separated list of mailboxes.
0107 A list of mailboxes, together with a display name for the list, forms a **group**, and multiple groups can form an
0108 **address list**. This is however rarely used, you'll most often see a simple list of plain mailboxes.
0109 
0110 There are many more possible header fields than shown in this example, and the header can even contain
0111 arbitrary header fields, which usually are prefixed with `X-`, like `X-Face`.
0112 
0113 ### Encodings and charsets ### {#encodings}
0114 
0115     From: John Doe <john.doe@domain.com>
0116     Date: Mon, 22 Feb 2010 00:42:45 +0100
0117     MIME-Version: 1.0
0118     Content-Type: Text/Plain;
0119       charset="iso-8859-1"
0120     Content-Transfer-Encoding: quoted-printable
0121     
0122     Gr=FCezi Welt!
0123 
0124 The above shows a message that is using a different **charset** than the standard **US-ASCII** charset. The
0125 message body contains the string "Grüezi Welt!", which is **encoded** in a special way.
0126 
0127 The **content-type** of this message is **text/plain**, which means that the message is simple text. Later,
0128 other content types will be introduced, such as **text/html**. If there is no `Content-Type` header
0129 field, it is assumed that the content-type is `text/plain`.
0130 
0131 Before MIME was introduced, all messages were limited to the US-ASCII charset. Only the
0132 lower 127 values of the bytes were allowed to be used, the so-called **7-bit** range. Writing a message in
0133 another charset or using letters from the upper 127 byte values was not allowed.
0134 
0135 #### Charset Encoding ####
0136 
0137 When talking about charsets, it is important to understand how strings of text are converted to
0138 byte arrays, and the other way around. A message is nothing else than a big array of bytes.
0139 The bytes that form the body of the message somehow need to be interpreted as a text string. Interpreting
0140 a byte array as a text string is called **decoding** the text. Converting a text string to a byte array is called
0141 **encoding** the text. A **codec** (**co**der-**dec**oder) is a utility that can encode and decode text.
0142 In Qt, the class for text strings is QString, and the class for byte arrays is QByteArray.
0143 
0144 With the US-ASCII charset, encoding and decoding text is easy, one just has to look at an [ASCII table](https://en.wikipedia.org/wiki/ASCII_table)
0145 to be able to convert text strings to byte arrays and byte arrays to text strings. For
0146 example, the letter 'A' is represented by a single byte with the value of 65. When encountering a byte
0147 with the value 84, we can look that up in the table and see that it represents the letter 'T'.
0148 With the US-ASCII charset, each letter is represented by exactly one byte, which is very convenient.
0149 Even better, all letters commonly used in English text have byte values below 127, so the 7-bit limit
0150 of messages is no problem for text encoded with the US-ASCII charset.
0151 Another example: The string "Hello World!" is represented by the following byte array:
0152 
0153     48 65 6C 6C 6F 20 57 6F 72 6C 64
0154 
0155 Note that the byte values are written in hexadecimal form here, not in decimal as earlier.
0156 
0157 Now, what if we want to write a message that contains German umlauts or Chinese letters? Those
0158 are not in the ASCII table, therefore a different charset has to be used. There is a wealth of charsets
0159 to choose from. Not all charsets can handle all letters, for example the
0160 [ISO-8859-1](https://en.wikipedia.org/wiki/ISO-8859-1#ISO-8859-1) charset can handle
0161 German umlauts, but cannot handle Chinese or Arabic letters. The [Unicode standard](https://en.wikipedia.org/wiki/Unicode)
0162 is an attempt to introduce charsets that can handle all known letters in the
0163 world, in all languages. Unicode actually has several charsets, for example [UTF-8](https://en.wikipedia.org/wiki/UTF-8)
0164 and [UTF-16](https://en.wikipedia.org/wiki/UTF-16). In an ideal world, everyone would be using
0165 Unicode charsets, but for historic and legacy reasons, other charsets are still much in use.
0166 
0167 Charsets other than US-ASCII don't generally have as nice properties: A single letter can be represented
0168 by multiple bytes, and generally the byte values are not in the 7-bit range. Pay attention to the UTF-8
0169 charset: At first glance, it looks exactly like the US-ASCII charset, common latin letters like A - Z
0170 are encoded with the same byte values as with US-ASCII. However, letters other than A - Z are suddenly
0171 encoded with two or even more bytes. In general, one letter can be encoded in an abitrary number of bytes, depending
0172 on the charset. One can **not** rely on the `1 letter == 1 byte` assumption.
0173 
0174 Now, what should be done when the text string "Grüezi Welt!" should be sent in the body of a message?
0175 The first step is to choose a charset that can represent all of its letters. This already excludes US-ASCII.
0176 Once a charset is chosen, the text string is encoded into a byte array.
0177 "Grüezi Welt!" encoded with the ISO-8859-1 charset produces the following byte array:
0178 
0179     47 72 FC 65 7A 69 20 57 65 6C 74 21
0180 
0181 The letter 'ü' here is encoded using a single byte with the value `FC`.
0182 The same string encoded with UTF-8 looks slightly different:
0183 
0184     47 72 C3 BC 65 7A 69 20 57 65 6C 74 21
0185 
0186 Here, the letter 'ü' is encoded with two bytes, `C3 BC`. Still, one can see the similarity
0187 between the two charsets for the other letters.
0188 
0189 You can try this out yourself: Open your favorite text editor and enter some text with non-latin
0190 letters. Then save the file and view it in a hex editor to see how the text was converted to a
0191 byte array. Make sure to try out setting different charsets in your text editor.
0192 
0193 At this point, the text string is successfully converted to a byte array, using e.g. the ISO-8859-1
0194 charset. To indicate which charset was used, a **Content-Type** header field has to be added, with the correct
0195 **charset** parameter. In our example above, that was done. If the charset parameter of the `Content-Type`,
0196 or even the complete `Content-Type` header field is left out, the receiver can not know how to interpret
0197 the byte array! In these cases, the byte array is usually decoded incorrectly, and the text strings contain
0198 wrong letters or lots of question marks. There is even a special term for such wrongly decoded text,
0199 [Mojibake](https://en.wikipedia.org/wiki/Mojibake). It is important to always know what charset
0200 your byte array is encoded with, otherwise an attempt at decoding the byte array into a text string will fail and produce
0201 Mojibake. **There is no such thing as plain text!** If there is no `Content-Type` header field in
0202 a message, the message body should be interpreted as US-ASCII.
0203 
0204 To learn more about charsets and encodings, read 
0205 [The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)](https://www.joelonsoftware.com/articles/Unicode.html)
0206 and [A tutorial on character code issues](https://www.cs.tut.fi/~jkorpela/chars.html). Especially
0207 the first article should really be read, as the name indicates.
0208 
0209 #### Content Transfer Encoding ####
0210 
0211 Now, we can't use the byte array that was just created in a message. The string encoded with ISO-8859-1
0212 has the byte value `FC` for the letter 'ü', which is decimal value 252. However, as said earlier,
0213 messages are only valid when all bytes are in the 7-bit range, i.e. have byte value below 127.
0214 So what should we do for byte values that are greater than 127, how can they be added to messages? The solution
0215 for this is to use a **content transfer encoding** (CTE). A content transfer encoding takes a byte
0216 array as input and transforms it. The output is another byte array, but one which only uses byte values
0217 in the 7-bit range. One such content transfer encoding is **quoted-printable** (QTP), which is used in the
0218 above example. Quoted-printable is easy to understand: When encountering a byte that has a value greater
0219 than 127, it is simply replaced by a '=', followed by the hexadecimal code of the byte value, represented
0220 as letters and digits encoded with ASCII. This means
0221 that a byte with the value 252 is replaced with the ASCII string `=FC`, since `FC`
0222 is the hexadecimal value of 252. The ASCII string `=FC` itself is now three bytes big,
0223 `3D 46 43`. Therefore, the quoted-printable encoding replaces each byte outside of the 7-bit
0224 range with 3 new bytes. Decoding quoted-printable encoding is also easy: Each time a byte with the value
0225 `3D`, which is the letter '=' in ASCII, is encountered, the next two following bytes are interpreted
0226 as the hex value of the resulting byte. The quoted-printable encoding was invented to make reading the
0227 byte array easy for humans.
0228 
0229 The quoted-printable encoding is not a good choice when the input byte array contains lots of bytes
0230 outside the 7-bit range, as the resulting byte array will be three times as big in the worst case,
0231 which is a waste of space. Therefore another content transfer encoding was introduced, **Base64**.
0232 The details of the base64 encoding are too much to write about here; refer to the
0233 [Wikipedia article](https://en.wikipedia.org/wiki/Base64) or the [RFC](https://tools.ietf.org/html/rfc2045#section-6.8)
0234 for details. As an example, the ISO-8859-1 encoded text string "Grüezi Welt!" is, after encoding it with base64,
0235 represented by the following ASCII string: `R3L8ZXppIFdlbHQh`.
0236 To express the same in byte arrays: The byte array `47 72 FC 65 7A 69 20 57 65 6C 74 21`
0237 is, after encoding it with base64,
0238 represented by the byte array `52 33 4C 38 5A 58 70 70 49 46 64 6C 62 48 51 68`.
0239 
0240 There are two other content transfer encodings besides quoted printable and base64: **7-bit** and
0241 **8-bit**. 7-bit is just a marker to indicate that no content transfer encoding is used. This is the
0242 case when the byte array is already completely in the 7-bit range, for example when writing English
0243 text using the US-ASCII charset. 8-bit is also a marker to indicate that no content transfer encoding
0244 was used. This time, not because it was not necessary, but because of a special exception, byte values
0245 outside of the 7-bit range are allowed. For example, some SMTP servers support the
0246 [8BITMIME](https://tools.ietf.org/html/rfc1652) extension, which indicates that they accept
0247 bytes outside of the 7-bit range. In this case, one can simply use the byte arrays as-is, without using
0248 any content transfer encoding. Creating messages with 8-bit content transfer encoding is currently not
0249 supported by KMime. The advantage of 8-bit is that there is no overhead in size, unlike with
0250 base64 or even quoted-printable.
0251 
0252 When using one of the 4 contents transfer encodings, i.e. quoted-printable, base64, 7-bit or 8-bit, this
0253 has to be indicated in the header field **Content-Transfer-Encoding**. If the header field is left out,
0254 it is assumed that the content transfer encoding is 7-bit. The example above uses quoted-printable.
0255 
0256     From: John Doe <john.doe@domain.com>
0257     Date: Mon, 22 Feb 2010 00:42:45 +0100
0258     MIME-Version: 1.0
0259     Content-Type: Text/Plain;
0260       charset="iso-8859-1"
0261     Content-Transfer-Encoding: base64
0262     
0263     R3L8ZXppIFdlbHQh
0264 
0265 The same example, this time encoded with the base64 content transfer encoding.
0266 
0267     From: John Doe <john.doe@domain.com>
0268     Date: Mon, 22 Feb 2010 00:42:45 +0100
0269     MIME-Version: 1.0
0270     Content-Type: Text/Plain;
0271       charset="utf-8"
0272     Content-Transfer-Encoding: base64
0273     
0274     R3LDvGV6aSBXZWx0IQ==
0275 
0276 Again the same example, this time using UTF-8 as the charset.
0277 
0278     From: John Doe <john.doe@domain.com>
0279     Date: Mon, 22 Feb 2010 00:42:45 +0100
0280     MIME-Version: 1.0
0281     Content-Type: Text/Plain;
0282       charset="utf-8"
0283     Content-Transfer-Encoding: quoted-printable
0284 
0285     Gr=C3=BCezi Welt!
0286 
0287 The example with a combination of UTF-8 and quoted-printable CTE. As said somewhere above, with the
0288 UTF-8 encoding, the letter 'ü' is represented by the two bytes `C3 BC`.
0289 
0290     From: John Doe <john.doe@domain.com>
0291     Date: Mon, 22 Feb 2010 00:42:45 +0100
0292     MIME-Version: 1.0
0293     Content-Type: Text/Plain;
0294       charset="utf-8"
0295     Content-Transfer-Encoding: 7-bit
0296 
0297     Hello World
0298 
0299 A different example, showing 7-bit content transfer encoding. Although the UTF-8 charset has lots
0300 of letters that are represented by bytes outside of the 7-bit range, the string "Hello World" can
0301 be fully represented in the 7-bit range here, even with UTF-8.
0302 
0303 In the [further reading](@ref links) section, you will find links to web applications that demonstrate
0304 encodings and charsets.
0305 
0306 #### Conclusion ####
0307 
0308 When adding a text string to the body of a message, it needs to be encoded twice: First, the encoding of the charset
0309 needs to be applied, which transforms the text string into a byte array. Afterwards, the content transfer
0310 encoding has to be applied, which transforms the byte array from the first step into a byte array that
0311 only has bytes in the 7-bit range.
0312 
0313 When decoding, the same has to be done, in reverse: One first has decode the byte array with the content transfer encoding, to get a byte
0314 array that has all 256 possible byte values. Afterwards, the resulting byte array needs to be decoded
0315 with the correct charset, to transform it into a text string. For those two decoding steps, one has to
0316 look at the `Content-Type` and the `Content-Transfer-Encoding` header fields to find the correct
0317 charset and CTE for decoding.
0318 
0319 It is important to always keep the charset and the content transfer encoding in mind. Byte arrays and
0320 strings are not to be confused. Byte arrays that are encoded with a CTE are not to be confused with
0321 byte arrays that are **not** encoded with a CTE.
0322 
0323 This section showed how to use different charsets in the *body* of a message. The next section will
0324 show what to do when another charset is needed in one of the *header* field bodies.
0325 
0326 ### Encoding in Header Fields ### {#header-encoding}
0327 
0328 In the last section, we discussed how to use different charsets in the body of a message. But what if
0329 a different charset needs to be added to one of the header fields? For example one might want to write
0330 a mail to a mailbox with the display name "András Manţia" and with the subject "Grüezi!".
0331 
0332 The header fields are limited to characters in the 7-bit range, and are interpreted as US-ASCII.
0333 That means the header field names, such as "From: ", are all encoded in US-ASCII. The header field
0334 bodies, such as the "1.0" of `MIME-Version`, are also encoded with US-ASCII. This is mandated by
0335 [the RFC](https://tools.ietf.org/html/rfc5322#section-2).
0336 
0337 The `Content-Type` and the `Content-Transfer-Encoding` header fields only apply to the message body,
0338 they have no meaning for other header fields.
0339 
0340 This means that any letter in a different charset has to be encoded in some way to satisfy the RFC.
0341 Letters with a different charset are only allowed in some of the header field bodies; the header field
0342 names always have to be in US-ASCII.
0343 
0344     From: Thomas McGuire <thomas@domain.com>
0345     Subject: =?iso-8859-1?q?Gr=FCezi!?=
0346     Date: Mon, 22 Feb 2010 14:34:01 +0100
0347     MIME-Version: 1.0
0348     To: =?utf-8?q?Andr=C3=A1s?= =?utf-8?q?_Man=C5=A3ia?= <andras@domain.com>
0349     Content-Type: Text/Plain;
0350       charset="us-ascii"
0351     Content-Transfer-Encoding: 7bit
0352     
0353     bla bla bla
0354 
0355 The above example shows how text that is encoded with a different charset than US-ASCII is handled
0356 in the message header. This can be seen in the bodies of the `Subject` header field and the `To` header field.
0357 In this example, the body of the message is unimportant, it is just "bla bla bla" in US-ASCII.
0358 The way the header field bodies are encoded is sometimes referred to as a **RFC2047 string** or as an **encoded word**, which has
0359 its origin in the [RFC](https://tools.ietf.org/html/rfc2047) where this encoding scheme is defined.
0360 RFC2047 strings are only allowed in some of the header fields, like `Subject`, and in the display name
0361 of mailboxes in header fields like `From` and `To`. In other header fields, such as `Date` and
0362 `MIME-Version`, they are not allowed, but they wouldn't make much sense there anyway, since those are
0363 structured header fields with a clearly defined structure.
0364 
0365 RFC2047 strings start with "=?" and end with "?=". Between those markers, they consists of three parts:
0366 * The charset, such as "iso-8859-1"
0367 * The encoding, which is "q" or "b"
0368 * The encoded text
0369 
0370 These three parts are separated with a '?'. Encoding the third part, the text, is very similar to how
0371 text strings in the message body are encoded: First, the text string is encoded to a byte array using
0372 the charset encoding. Afterwards, the second encoding is used on the result, to ensure that all resulting
0373 bytes are within the 7-bit range.
0374 
0375 The *second encoding* here is almost identical to the content transfer encoding. There are two
0376 possible encodings, **b** and **q**. The `b` encoding is the same as the base64 encoding of the content
0377 transfer encoding. The `q` encoding is very similar to the quoted-printable encoding of the content
0378 transfer encoding, but with some little differences that are described in
0379 [the RFC](https://tools.ietf.org/html/rfc2047#section-4.2).
0380 
0381 Let's examine the subject of the message, `=?iso-8859-1?q?Gr=FCezi!?=`, in detail:
0382 
0383 The first part of the RFC2027 string is the charset, so it is ISO-8859-1 in this case. The second part
0384 is the encoding, which is the `q` encoding here. The last part is the encoded text, which is
0385 `Gr=FCezi!`. As with the quoted-printable encoding, "=FC" is the encoding for the byte with
0386 the value `FC`, which in the ISO-8859-1 charset is the letter 'ü'. The complete decoded
0387 text is therefore "Grüezi!".
0388 
0389 Each RFC2047 string in the header can use a different charset: In this example, the `Subject` uses ISO-8859-1,
0390 `To` uses UTF-8 and the message body uses US-ASCII.
0391 
0392 In the `To` header field, two RFC2047 strings are used. A single, bigger, RFC2047 string for the whole
0393 display name could also have been used. In this case, the second RFC2047 string starts with an underscore,
0394 which is decoded as a space in the `q` encoding. The space between the two RFC2047 strings is ignored,
0395 it is just used to separate the two encoded words.
0396 
0397 There are some restriction on RFC2047 strings: They are not allowed to be longer than 75 characters,
0398 which means two or more encoded words have to be used for long text strings. Also, there are some
0399 restrictions on where RFC2047 strings are allowed; most importantly, the address specification must
0400 not be encoded, to be backwards compatible. For further details, refer to the RFC.
0401 
0402 ### Messages with attachments ### {#multipart-mixed}
0403 
0404 Until now, we only looked at messages that had a single text part as the message body. In this section,
0405 we'll examine messages with attachments.
0406 
0407     From: frank@domain.com
0408     To: greg@domain.com
0409     Subject: Nice Photo
0410     Date: Sun, 28 Feb 2010 19:57:00 +0100
0411     MIME-Version: 1.0
0412     Content-Type: Multipart/Mixed;
0413       boundary="Boundary-00=_8xriL5W6LSj00Ly"
0414     
0415     --Boundary-00=_8xriL5W6LSj00Ly
0416     Content-Type: Text/Plain;
0417       charset="us-ascii"
0418     Content-Transfer-Encoding: 7bit
0419     
0420     Hi Greg,
0421     
0422     attached you'll find a nice photo.
0423     
0424     --Boundary-00=_8xriL5W6LSj00Ly
0425     Content-Type: image/jpeg;
0426       name="test.jpeg"
0427     Content-Transfer-Encoding: base64
0428     Content-Disposition: attachment;
0429       filename="test.jpeg"
0430     
0431     /9j/4AAQSkZJRgABAQAAAQABAAD/4Q3XRXhpZgAASUkqAAgAAAAHAAsAAgAPAAAAYgAAAAABBAAB
0432     [SNIP 800 lines]
0433     ze5CdSH2Z8yTatHSV2veW0rKzeq30//Z
0434     
0435     --Boundary-00=_8xriL5W6LSj00Ly--
0436 
0437 *Note: Since the image in this message would be really big, most of it is omitted / snipped here.*
0438 
0439 The above example consists of two parts: A normal text part and an image attachment. Messages that
0440 consist of multiple parts are called **multipart** messages. The top-level content-type therefore is
0441 **multipart/mixed**. `Mixed` simply means that the following parts have no relation to each other,
0442 it is just a random mixture of parts. Later, we will look at other types, such as `multipart/alternative`
0443 or `multipart/related`. A **part** is sometimes also called **node**, **content** or **MIME part**.
0444 
0445 Each MIME part of the message is separated by a **boundary**, and that boundary
0446 is specified in the top-level content-type header as a parameter. In the message body, the boundary
0447 is prefixed with `"--"`, and the last boundary is suffixed with `"--"`, so that the end of the message can
0448 be detected. When creating a message, care must be taken that the boundary appears nowhere else in the
0449 message, for example in the text part, as the parser would get confused by this.
0450 
0451 A MIME part begins right after the boundary. It consists of a **MIME header** and a **MIME body**, which
0452 are separated by an empty line. The MIME header should not be confused with the message header: The
0453 message header contains metadata about the whole message, like subject and date. The MIME header only
0454 contains metadata about the specific MIME part, like the content type of the MIME part. MIME header
0455 field names always start with `"Content-"`.
0456 The example above shows the three most important MIME header fields. Usually those are the only ones
0457 used. The top-level header of a message actually mixes the message metadata and the MIME metadata into one header: In this
0458 example, the header contains the `Date` header field, which is an ordinary header field, and it contains
0459 the `Content-Type` header field, which is a MIME header field.
0460 
0461 MIME parts can be nested, and therefore form a tree. The above example has the following tree:
0462 
0463     multipart/mixed
0464     |- text/plain
0465     \- image/jpeg
0466 
0467 The `text/plain` node is therefore a `child` of the `multipart/mixed` node. The `multipart/mixed` node
0468 is a `parent` of the other two nodes. The `image/jpeg` node is a **sibling** of the `text/plain` node.
0469 `Multipart` nodes are the only nodes that have children, other nodes are **leaf** nodes.
0470 The body of a multipart node consists of all complete child nodes (MIME header and MIME body), separated
0471 by the boundary.
0472 
0473 Each MIME part can have a different content transfer encoding. In the above example, the text part has
0474 a `7bit` CTE, while the image part has a `base64` CTE. The multipart/mixed node does not specify
0475 a CTE, multipart nodes always have `7bit` as the CTE. This is because the body of multipart nodes can
0476 only consist of bytes in the 7 bit range: The boundary is 7 bit, the MIME headers are 7 bit, and the
0477 MIME bodies are already encoded with the CTE of the child MIME part, and are therefore also 7 bit. This means
0478 no CTE for multipart nodes is necessary.
0479 
0480 The MIME part for the image does not specify a charset parameter in the content type header field. This
0481 is because the body of that MIME part will not be interpreted as a text string, therefore the byte array
0482 does not need to be decoded to a string. Instead, the byte array is interpreted as an image, by an image
0483 renderer. The message viewer application passes the MIME part body as a byte array to the image renderer.
0484 The content type consists of a **media type** and a **subtype**. For example, the content type
0485 `"text/html"` has the media type "text" and the subtype "html". Only nodes that have the media type "text"
0486 need to specify a charset, as those nodes are the only nodes of which the body is interpreted as a text string.
0487 
0488 The only header field not yet encountered in previous sections is the **Content-Disposition** header field,
0489 which is defined in a [separate RFC](https://tools.ietf.org/html/rfc2183). It describes how
0490 the message viewer application should display the MIME part. In the case of the image part, is should
0491 be presented as an attachment. The **filename** parameter tells the message viewer application which filename
0492 should be used by default when the user saves the attachment to disk.
0493 
0494 The content type header field for the image MIME part has a **name** parameter, which is similar to the
0495 `filename` parameter of the `Content-Disposition` header field. The difference is that `name` refers
0496 to the name of the complete MIME part, whereas `filename` refers to the name of the attachment. The
0497 `name` parameter of the `Content-Type` header field in this case is superfluous and only exists for
0498 backwards compatibility, and can be ignored;
0499 the `filename` parameter of the `Content-Disposition` header field should be preferred when it is present.
0500 
0501     From: Thomas McGuire <thomas@domain.com>
0502     To: sebastian@domain.com
0503     Subject: Help with SPARQL
0504     Date: Sun, 28 Feb 2010 21:57:51 +0100
0505     MIME-Version: 1.0
0506     Content-Type: Multipart/Mixed;
0507       boundary="Boundary-00=_PjtiLU2PvHpvp/R"
0508     
0509     --Boundary-00=_PjtiLU2PvHpvp/R
0510     Content-Type: Text/Plain;
0511       charset="us-ascii"
0512     Content-Transfer-Encoding: 7bit
0513     
0514     Hi Sebastian,
0515     
0516     I have a problem with a SPARQL query, can you help me debug this? Attached is
0517     the query and a screenshot showing the result.
0518     
0519     --Boundary-00=_PjtiLU2PvHpvp/R
0520     Content-Type: text/plain;
0521       charset="UTF-8";
0522       name="query.txt"
0523     Content-Transfer-Encoding: 7bit
0524     Content-Disposition: attachment;
0525       filename="query.txt"
0526     
0527     prefix nco:<http://www.semanticdesktop.org/ontologies/2007/03/22/nco#>
0528     
0529     SELECT ?person
0530     WHERE {
0531      ?person a nco:PersonContact .
0532      ?person nco:birthDate ?birthDate .
0533     }"
0534     --Boundary-00=_PjtiLU2PvHpvp/R
0535     Content-Type: image/png;
0536       name="screenshot.png"
0537     Content-Transfer-Encoding: base64
0538     Content-Disposition: attachment;
0539       filename="screenshot.png"
0540     
0541     AAAAyAAAAAEBBAABAAAAyAAAAA0BAgATAAAAcQAAABIBAwABAAAAAQAAADEBAgAPAAAAhAAAAGmH
0542     [SNIP]
0543     YXJlLmpwZWcAZGlnaUthbS0w
0544     
0545     --Boundary-00=_PjtiLU2PvHpvp/R--
0546 
0547 The above example message consists of three MIME parts: The main text part and two attachments.
0548 One attachment has the media type `text`, therefore a charset parameter is necessary to correctly
0549 display it. The MIME tree looks like this:
0550 
0551     multipart/mixed
0552     |- text/plain
0553     |- text/plain
0554     \- image/jpeg
0555 
0556 ### HTML Messages ### {#multipart-alternative}
0557 
0558     From: Thomas McGuire <thomas@domain.com>
0559     Subject: HTML test
0560     Date: Thu, 4 Mar 2010 13:59:18 +0100
0561     MIME-Version: 1.0
0562     Content-Type: multipart/alternative;
0563       boundary="Boundary-01=_m66jLd2/vZrH5oe"
0564     Content-Transfer-Encoding: 7bit
0565     
0566     --Boundary-01=_m66jLd2/vZrH5oe
0567     Content-Type: text/plain;
0568       charset="us-ascii"
0569     Content-Transfer-Encoding: 7bit
0570     
0571     Hello World
0572     
0573     --Boundary-01=_m66jLd2/vZrH5oe
0574     Content-Type: text/html;
0575       charset="us-ascii"
0576     Content-Transfer-Encoding: 7bit
0577     
0578     <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0//EN" "http://www.w3.org/TR/REC-html40/strict.dtd">
0579     <html>
0580       <head></head>
0581       <body>
0582         Hello <b>World</b>
0583       </body>
0584     </html>
0585     --Boundary-01=_m66jLd2/vZrH5oe--
0586 
0587 The above example is a simple HTML message. It consists of a plain text and a HTML part, which are
0588 in a **multipart/alternative** container. The message has the following structure:
0589 
0590     multipart/alternative
0591     |- text/plain
0592     \- text/html
0593 
0594 The HTML part and the plain text part have the identical content, except that the HTML part contains
0595 additional markup, in this case for displaying the word `World` in bold. Since those parts are in a
0596 multipart/alternative container, the message viewer application can freely choose which part it displays.
0597 Some users might prefer reading the message in HTML format, some might prefer reading the message
0598 in plain text format.
0599 
0600 Of course, a HTML message could also consist only of a single `text/html`, without the multipart/alternative
0601 container and therefore without an alternative plain text part. However, people preferring the plain
0602 text version wouldn't like this, especially if their mail client has no HTML engine and they would see
0603 the HTML source including all tags only. Therefore, HTML messages should always include an alternative plain text part.
0604 
0605 HTML messages can of course also contain attachments. In this case, the message contains both a
0606 multipart/alternative and a multipart/mixed node, for example with the following structure, for a HTML
0607 message that has an image attachment:
0608 
0609     multipart/mixed
0610     |- multipart/alternative
0611     |  |- text/plain
0612     |  \- text/html
0613     \- image/png
0614 
0615 The message itself would look like this:
0616 
0617     From: Thomas McGuire <thomas@domain.com>
0618     Subject: HTML message with an attachment
0619     Date: Thu, 4 Mar 2010 15:20:26 +0100
0620     MIME-Version: 1.0
0621     Content-Type: Multipart/Mixed;
0622       boundary="Boundary-00=_qG8jLwWCwkUfJV1"
0623     
0624     --Boundary-00=_qG8jLwWCwkUfJV1
0625     Content-Type: multipart/alternative;
0626       boundary="Boundary-01=_qG8jLfs1FRmlOhl"
0627     Content-Transfer-Encoding: 7bit
0628     
0629     --Boundary-01=_qG8jLfs1FRmlOhl
0630     Content-Type: text/plain;
0631       charset="us-ascii"
0632     Content-Transfer-Encoding: 7bit
0633     
0634     Hello World
0635     
0636     --Boundary-01=_qG8jLfs1FRmlOhl
0637     Content-Type: text/html;
0638       charset="us-ascii"
0639     Content-Transfer-Encoding: 7bit
0640     
0641     <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0//EN" "http://www.w3.org/TR/REC-html40/strict.dtd">
0642     <html>
0643       <head></head>
0644       <body>
0645         Hello <b>World</b>
0646       </body>
0647     </html>
0648     --Boundary-01=_qG8jLfs1FRmlOhl--
0649     
0650     --Boundary-00=_qG8jLwWCwkUfJV1
0651     Content-Type: image/png;
0652       name="test.png"
0653     Content-Transfer-Encoding: base64
0654     Content-Disposition: attachment;
0655       filename="test.png"
0656     
0657     iVBORw0KGgoAAAANSUhEUgAAABAAAAAQCAYAAAAf8/9hAAAACXBIWXMAAA8SAAAPEgEhm/IzAAAC
0658     [SNIP]
0659     eFkXsFgBMG4fJhYlx+iyB3cLpNZwYr/iP7teTwNYa7DZAAAAAElFTkSuQmCC
0660     
0661     --Boundary-00=_qG8jLwWCwkUfJV1--
0662 
0663 ### HTML Messages with Inline Images ### {#multipart-related}
0664 
0665 HTML has support for showing images, with the `img` tag. Such an image is shown at the place where
0666 the `img` tag occurs, which is called an **inline image**. Note that inline images are different
0667 from images that are just normal attachments: Normal attachments are always shown at the beginning or
0668 at the end of the message, while inline images are shown in-place. In HTML, the `img` tag points to an
0669 image file that is either a file on disk or a URL of an image on the Internet. To make inline images
0670 work with MIME messages, a different mechanism is needed, since the image is not a file on disk or on
0671 the Internet, but a MIME part somewhere in the same message. As specified in
0672 [RFC 2557](https://tools.ietf.org/html/rfc2557), the way this can be done is by referring
0673 to a **Content-ID** in the `img` tag, and marking the MIME part that is the image with that content
0674 ID as well.
0675 
0676 An example will probably be more clear than this explanation:
0677 
0678     From: Thomas McGuire <thomas@domain.com>
0679     Subject: Inine Image Test
0680     Date: Thu, 4 Mar 2010 16:54:53 +0100
0681     MIME-Version: 1.0
0682     Content-Type: multipart/related;
0683       boundary="Boundary-02=_Nf9jLpJ2aGp5RQK"
0684     Content-Transfer-Encoding: 7bit
0685     
0686     --Boundary-02=_Nf9jLpJ2aGp5RQK
0687     Content-Type: multipart/alternative;
0688       boundary="Boundary-01=_Nf9jLZ6aPhm3WrN"
0689     Content-Transfer-Encoding: 7bit
0690     Content-Disposition: inline
0691     
0692     --Boundary-01=_Nf9jLZ6aPhm3WrN
0693     Content-Type: text/plain;
0694       charset="us-ascii"
0695     Content-Transfer-Encoding: 7bit
0696     
0697     Text before image
0698     
0699     Text after image
0700     
0701     --Boundary-01=_Nf9jLZ6aPhm3WrN
0702     Content-Type: text/html;
0703       charset="us-ascii"
0704     Content-Transfer-Encoding: 7bit
0705     
0706     <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0//EN" "http://www.w3.org/TR/REC-html40/strict.dtd">
0707     <html>
0708       <head></head>
0709       <body>
0710         Text before image<br>
0711         <img src="cid:547730348@KDE" /><br>
0712         Text after image
0713       </body>
0714     </html>
0715     --Boundary-01=_Nf9jLZ6aPhm3WrN--
0716     
0717     --Boundary-02=_Nf9jLpJ2aGp5RQK
0718     Content-Type: image/png;
0719       name="test.png"
0720     Content-Transfer-Encoding: base64
0721     Content-Id: <547730348@KDE>
0722     
0723     iVBORw0KGgoAAAANSUhEUgAAAMgAAADICAIAAAAiOjnJAAAACXBIWXMAAA7EAAAOxAGVKw4bAAAg
0724     [SNIP]
0725     AABJRU5ErkJggg==
0726     --Boundary-02=_Nf9jLpJ2aGp5RQK--
0727 
0728 The first thing you'll notice in this example probably is that it has a **multipart/related** node with
0729 the following structure:
0730 
0731     multipart/related
0732     |- multipart/alternative
0733     |  |- text/plain
0734     |  \- text/html
0735     \- image/png
0736 
0737 When the HTML part has inline image, the HTML part and its image part both have to be children of a
0738 multipart/related container, like in this example.
0739 In this case, the `img` tag has the source `cid:547730348@KDE`, which is a placeholder that refers
0740 to the Content-Id header of another part. The image part contains exactly that value in its `Content-Id`
0741 header, and therefore a message viewer application can connect both.
0742 
0743 The plain text part cannot have inline images, therefore its text might seem a bit confusing.
0744 
0745 HTML messages with inline images can of course also have attachments, in which the message structure
0746 becomes a mix of multipart/related, multipart/alternative and multipart/mixed. The following example
0747 shows the structure of a message with two inline images and one `.tar.gz` attachment:
0748 
0749     multipart/mixed
0750     |- multipart/related
0751     |  |- multipart/alternative
0752     |  |  |- text/plain
0753     |  |  \- text/html
0754     |  |- image/png
0755     |  \- image/png
0756     \- application/x-compressed-tar
0757 
0758 The structure of MIME messages can get arbitrarily complex, the above is just one relatively simple example.
0759 The nesting of multipart nodes can get much deeper, there is no restriction on nesting levels.
0760 
0761 ### Encapsulated messages ### {#encapsulated}
0762 
0763 Encapsulated messages are messages which are attachments to another message. The most common example
0764 is a forwarded mail, like in this example:
0765 
0766     From: Frank <frank@domain.com>
0767     To: Bob <bob@domain.com>
0768     Subject: Fwd: Blub
0769     MIME-Version: 1.0
0770     Content-Type: Multipart/Mixed;
0771       boundary="Boundary-00=_sX+jLVPkV1bLFdZ"
0772     
0773     --Boundary-00=_sX+jLVPkV1bLFdZ
0774     Content-Type: text/plain;
0775       charset="us-ascii"
0776     Content-Transfer-Encoding: 7bit
0777     
0778     Hi Bob,
0779     
0780     hereby I forward you an interesting message from Greg.
0781     
0782     --Boundary-00=_sX+jLVPkV1bLFdZ
0783     Content-Type: message/rfc822;
0784       name="forwarded message"
0785     Content-Transfer-Encoding: 7bit
0786     Content-Description: Forwarded Message
0787     Content-Disposition: inline
0788     
0789     From: Greg <greg@domain.com>
0790     To: Frank <frank@domain.com>
0791     Subject: Blub
0792     MIME-Version: 1.0
0793     Content-Type: Text/Plain;
0794       charset="us-ascii"
0795     Content-Transfer-Encoding: 7bit
0796     
0797     Bla Bla Bla
0798     
0799     --Boundary-00=_sX+jLVPkV1bLFdZ--
0800 
0801 
0802     multipart/mixed
0803     |- text/plain
0804     \- message/rfc822
0805     \- text/plain
0806 
0807 The attached message is treated like any other attachment, and therefore the top-level content type
0808 is multipart/mixed.
0809 The most interesting part is the `message/rfc822` MIME part. As usual, it has some MIME headers, like
0810 `Content-Type` or `Content-Disposition`, followed by the MIME body. The MIME body in this case is
0811 the attached message. Since it is a message, it consists of a header and a body itself.
0812 Therefore, the `message/rfc822` MIME part appears to have two headers; in reality, it is the normal
0813 MIME header and the message header of the encapsulated message. The message header and the message body
0814 are both in the MIME body of the `message/rfc822` MIME part.
0815 
0816 ### Signed and Encrypted Messages ### {#crypto}
0817 
0818 MIME messages can be cryptographically signed and/or encrypted. The format for those messages is
0819 defined in [RFC 1847](https://tools.ietf.org/html/rfc1847), which specifies two new
0820 multipart subtypes, **multipart/signed** and **multipart/encrypted**. The crypto format of these new
0821 security multiparts is defined in additional RFCs; the most common formats are
0822 [OpenPGP](https://tools.ietf.org/html/rfc3156) and [S/MIME](https://tools.ietf.org/html/rfc2633).
0823 Both formats use the principle of [public-key cryptography](https://en.wikipedia.org/wiki/Public-key_cryptography).
0824 OpenPGP uses **keys**, and S/MIME uses **certificates**. For easier text flow, only the term `key` will be used
0825 for both keys and certificates in the text below.
0826 
0827 Security multiparts only sign or encrypt a specific MIME part. The consequence is that the message headers
0828 can not be signed or encrypted. Also this means that it is possible to sign or encrypt only some of
0829 the MIME parts of a message, while leaving other MIME parts unsigned or unencrypted. Furthermore, it
0830 is possible to sign or encrypt different MIME parts with different crypto formats. As you can see,
0831 security multiparts are very flexible.
0832 
0833 Security multiparts are not supported by KMime. However, it is possible for applications to use KMime
0834 when providing support for crypto messages. For example, the messageviewer
0835 component in KDE PIM's [messagelib](https://api.kde.org/kdepim/messagelib/html/index.html) supports signed and encrypted MIME parts, and the
0836 messagecomposer library can create
0837 such messages.
0838 
0839 Signed MIME parts are signed with the private key of the sender, and everybody who has the
0840 public key of the sender can verify the signature. Encrypted MIME parts are encrypted with the public
0841 key of the receiver, and only the receiver, who is the sole person possessing the private key, can decrypt
0842 it. Sending an encrypted message to multiple recipients therefore means that the message has to be sent
0843 multiple times, once for each receiver, as each message needs to be encrypted with a different key.
0844 
0845 #### Signed MIME parts ####
0846 
0847 A multipart/signed MIME part has exactly two children: The first child is the content that is signed,
0848 and the second child is the signature.
0849 
0850     From: Thomas McGuire <thomas@domain.com>
0851     Subject: My Subject
0852     Date: Mon, 15 Mar 2010 12:20:16 +0100
0853     MIME-Version: 1.0
0854     Content-Type: multipart/signed;
0855       boundary="nextPart2567247.O5e8xBmMpa";
0856       protocol="application/pgp-signature";
0857       micalg=pgp-sha1
0858     Content-Transfer-Encoding: 7bit
0859     
0860     --nextPart2567247.O5e8xBmMpa
0861     Content-Type: Text/Plain;
0862       charset="us-ascii"
0863     Content-Transfer-Encoding: 7bit
0864     
0865     Simple message
0866     
0867     --nextPart2567247.O5e8xBmMpa
0868     Content-Type: application/pgp-signature; name=signature.asc
0869     Content-Description: This is a digitally signed message part.
0870     
0871     -----BEGIN PGP SIGNATURE-----
0872     Version: GnuPG v2.0.14 (GNU/Linux)
0873     
0874     iEYEABECAAYFAkueF/UACgkQKglv3sO8a1MdTACgnBEP6ZUal931Vwu7PyiXT1bn
0875     Zr0Anj4bAI9JhHEDiwA/iwrWGfSC+Nlz
0876     =d2ol
0877     -----END PGP SIGNATURE-----
0878     --nextPart2567247.O5e8xBmMpa--
0879 
0880 
0881     multipart/signed
0882     |- text/plain
0883     \- application/pgp-signature
0884 
0885 The example here uses the OpenPGP format to sign a simply plain text message. Here, the text/plain
0886 MIME part is signed, and the application/pgp-signature MIME part contains the signature data, which in
0887 this case is ASCII-armored.
0888 
0889 As said above, it is possible to sign only some MIME parts. A message which has a image/jpeg attachment
0890 that is signed, but a main text part is not signed, has the following MIME structure:
0891 
0892     multipart/mixed
0893     |- text/plain
0894     \- multipart/signed
0895     |- image/jpeg
0896     \- application/pgp-signature
0897 
0898 It is possible to sign multipart parts as well. Consider the above example that has a plain text part
0899 and an image attachment. Those two parts can be signed together, with the following structure:
0900 
0901     multipart/signed
0902     |- multipart/mixed
0903     |  |- text/plain
0904     |  \- image/jpeg
0905     \- application/pgp-signature
0906 
0907 Signed messages in the S/MIME format use a different content type for the signature data, like here:
0908 
0909     multipart/signed
0910     |- text/plain
0911     \- application/x-pkcs7-signature
0912 
0913 #### Encrypted MIME parts ####
0914 
0915 Multipart/encrypted MIME parts also have exactly two children: The first child contains metadata about
0916 the encrypted data, such as a version number. The second child then contains the actual encrypted data.
0917 
0918     From: someone@domain.com
0919     To: Thomas McGuire <thomas@domain.com>
0920     Subject: Encrypted message
0921     Date: Mon, 15 Mar 2010 12:50:16 +0100
0922     MIME-Version: 1.0
0923     Content-Type: multipart/encrypted;
0924       boundary="nextPart2726747.j47xUGTWKg";
0925       protocol="application/pgp-encrypted"
0926     Content-Transfer-Encoding: 7bit
0927     
0928     --nextPart2726747.j47xUGTWKg
0929     Content-Type: application/pgp-encrypted
0930     Content-Disposition: attachment
0931     
0932     Version: 1
0933     --nextPart2726747.j47xUGTWKg
0934     Content-Type: application/octet-stream
0935     Content-Disposition: inline; filename="msg.asc"
0936     
0937     -----BEGIN PGP MESSAGE-----
0938     Version: GnuPG v2.0.14 (GNU/Linux)
0939     
0940     hQIOA8p5rdC5CBNfEAf+NZVzVq48C1r5opOOiWV96+FUzIWuMQ6u8fzFgI7YVyCn
0941     [SNIP]
0942     =reNr
0943     --nextPart2726747.j47xUGTWKg--
0944     -----END PGP MESSAGE-----
0945 
0946 
0947     multipart/encrypted
0948     |- application/pgp-encrypted
0949     \- application/octet-stream
0950 
0951 The encrypted data is contained in the `application/octet-stream` MIME part. Without decrypting
0952 the data, it is unknown what the original content type of the encrypted MIME data is! The encrypted
0953 data could be a simple text/plain MIME part, an image attachment, or a multipart part. The encrypted
0954 data contains both the MIME header and the MIME body of the original MIME part, as the header is needed
0955 to know the content type of the data. The data could as well be of content type multipart/signed, in
0956 which case the message would be both signed and encrypted.
0957 
0958 #### Inline Crypto Formats ####
0959 
0960 Although using the security multiparts `multipart/signed` and `multipart/encrypted` is the recommended
0961 standard, there are other possibilities to sign or encrypt a message. The most common methods are
0962 **Inline OpenPGP** and **S/MIME Opaque**.
0963 
0964 For inline OpenPGP messages, the crypto data is contained inlined in the actual MIME part. For example,
0965 a message with a signed text/plain part might look like this:
0966 
0967     From: someone@domain.com
0968     To: someoneelse@domain.com
0969     Subject: Inline OpenPGP test
0970     MIME-Version: 1.0
0971     Content-Type: text/plain;
0972       charset="us-ascii"
0973     Content-Transfer-Encoding: 7bit
0974     Content-Disposition: inline
0975     
0976     -----BEGIN PGP SIGNED MESSAGE-----
0977     Hash: SHA1
0978     
0979     Inline OpenPGP signed example.
0980     -----BEGIN PGP SIGNATURE-----
0981     Version: GnuPG v2.0.14 (GNU/Linux)
0982     
0983     iEYEARECAAYFAkueJ2EACgkQKglv3sO8a1MS3QCfcsYnJG7uYQxzxz6J5cPF7lHz
0984     WIoAn3PjVPlWibu02dfdFObwd2eJ1jAW
0985     =p3uO
0986     -----END PGP SIGNATURE-----
0987 
0988 Encrypted inline OpenPGP works in a similar way. Opaque S/MIME messages are also similar: For signed
0989 MIME parts, both the signature and the signed data are contained in a single MIME part with a content
0990 type of `application/pkcs7-mime`.
0991 
0992 As security multiparts are preferred over inline OpenPGP and over opaque S/MIME, I won't go into more
0993 detail here.
0994 
0995 ### Miscellaneous Points about Messages ### {#misc}
0996 
0997 #### Line Breaks ####
0998 
0999 Each line in a MIME message has to end with a **CRLF**, which is a carriage return followed by a
1000 newline, which is the escape sequence `\\r\\n`. CR and LF may not appear in other places in
1001 a MIME message. Special care needs to be taken with encoded line breaks in binary data, and with
1002 distinguishing soft and hard line breaks when converting between different content transfer encodings.
1003 For more details, have a look at the RFCs.
1004 
1005 While the official format is to have a CRLF at the end of each line, KMime only expects a single LF
1006 for its in-memory storage. Therefore, when loading a message from disk or from a server into KMime, the CRLFs need
1007 to be converted to LFs first, for example with KMime::CRLFtoLF(). The opposite needs to be done when
1008 storing a KMime message somewhere.
1009 
1010 Lines should not be longer than 78 characters and may not be longer than 998 characters.
1011 
1012 #### Header Folding and CFWS ####
1013 
1014 Header fields can span multiple lines, which was already shown in some of the examples above where
1015 the parameters of the header field value were in the next line. The header field is said to be
1016 **folded** in this case. In general, header fields can be folded whenever whitespace (**WS**) occurs.
1017 
1018 Header field values can contain **comments**; these comments are semantically invisible and have no
1019 meaning. Comments are surrounded by parentheses.
1020 
1021     Date: Thu, 13
1022           Feb 1969 23:32 -0330 (Newfoundland Time)
1023 
1024 This example shows a folded header that also has a comment (*Newfoundland Time*). The date header is a structured header
1025 field, and therefore it has to obey to a defined syntax; however, adding comments and whitespace is
1026 allowed almost anywhere, and they are ignored when parsing the message. Comments and whitespace where
1027 folding is allowed is sometimes referred to as **CFWS**. Any occurrence of CFWS is semantically regarded
1028 as a single space.
1029 
1030 # The two in-memory representations of messages # {#string-broken-down }
1031 
1032 There are two representations of messages in memory. The first is called **string representation**
1033 and the other one is called **broken-down representation**.
1034 
1035 String representation is somewhat misnamed,
1036 a better term would be "byte array representation". The string representation is just a big array of
1037 bytes in memory, and those bytes make up the encoded mail. The string representation is what is stored
1038 on disk or what is received from an IMAP server, for example.
1039 
1040 With the broken-down representation, the mail is *broken down* into smaller structures. For example,
1041 instead of having a single byte array for all headers, the broken-down structure has a list of individual headers,
1042 and each header in that list is again broken down into a structure. While the string representation
1043 is just an array of 7 bit characters that might be encoded, the broken-down representations contain the
1044 decoded text strings.
1045 
1046 As an example, consider the byte array
1047 
1048     "Hugo Maier" <hugo.maier@mailer.domain>
1049 
1050 Although this is just a bunch of 7 bit characters, a human immediately recognizes the broken-down structure and
1051 sees that the display name is "Hugo Maier" and that the localpart of the email address is "hugo.maier".
1052 To illustrate, the broken-down structure could be stored in a structure like this:
1053 
1054     struct Mailbox
1055     {
1056         QString displayName;
1057         QByteArray addressSpec;
1058     };
1059 
1060 The address spec actually could be broken down further into a localpart and a domain.
1061 The process of converting the string representation to a broken-down representation is called **parsing**, and
1062 the reverse is called **assembling**. Parsing a message is necessary when wanting to access or modify the broken-down
1063 structure. For example, when sending a mail,
1064 the address spec of a mailbox needs to be passed to the SMTP server, which means that the recipient headers need to
1065 be parsed in order to access that information. Another example is the message list in an mail application, where the
1066 broken-down structure of a mail is needed
1067 to display information like subject, sender and date in the list.
1068 On the other hand, assembling a message is for example done in the composer of a mail application, where the mail information
1069 is available in a broken-down form in the composer window, and is then assembled into a final MIME message that is then sent with SMTP.
1070 
1071 Parsing is often quite tricky. You should always use the methods from KMime instead of writing parsing
1072 routines yourself. Even the simple mailbox example above is in practice difficult to parse, as many things like comments
1073 and escaped characters need to be taken into consideration.
1074 The same is true for assembling: In the above case, one could be tempted to assemble the mailbox by simply
1075 writing code like this:
1076 
1077     QByteArray stringRepresentation = '"' + displayName + "\" <" + addressSpec + ">";
1078 
1079 However, just like with parsing, you shouldn't be doing assembling yourself. In the above case, for example,
1080 the display name might contain non-ASCII characters, and RFC2047 encoding would need to be applied. So use
1081 KMime for assembling in all cases.
1082 
1083 When parsing a message and assembling it afterwards, the result might not be the same as the original byte
1084 array. For example, comments in header fields are ignored during parsing and not stored in the broken-down
1085 structure, therefore the assembled message will also not contain comments.
1086 
1087 Messages in memory are usually stored in a broken-down structure so that it is easy to to access and
1088 manipulate the message. On disk and on servers, messages are stored in string representation.
1089 
1090 # Overview of KMime classes # {#classes-overview}
1091 
1092 KMime has basically two sets of classes: Classes for headers and classes for MIME
1093 parts. A MIME part is represented by `KMime::Content`. A Content can be parsed from a string representation
1094 and also be assembled from the broken-down representation again. If parsed, it has a list of sub-contents (in case of multipart contents) and a
1095 list of headers. If the Content is not parsed, it stores the headers and the body in a byte array, which can be accessed
1096 with head() and body().
1097 There is also a class `KMime::Message`, which basically is a thin wrapper around Content for the top-level
1098 MIME part. Message also contains convenience methods to access the message headers.
1099 
1100 For headers, there is a class hierarchy, with `KMime::Headers::Base` as the base class, and
1101 `KMime::Headers::Generics::Structured` and `KMime::Headers::Generics::Unstructured` in the next levels. Unstructured is
1102 for headers that don't have a defined structure, like Subject, whereas Structured headers have a
1103 specific structure, like Date. The header classes have methods to parse headers, like `from7BitString()`,
1104 and to assemble them, like `as7BitString()`. Once a header is parsed, the classes provide access to the
1105 broken-down structures; for example the `Date` header has a method `dateTime()`.
1106 The parsing in `from7BitString()` is usually handled by a protected `parse()` function, which in turn call
1107 parsing functions for different types, like `parseAddressList()` or `parseAddrSpec()` from the `KMime::HeaderParsing`
1108 namespace.
1109 
1110 When modifying messages, the message is first parsed into a broken-down representation. This broken-down
1111 representation can then be accessed and modified with the appropriate functions. After changing the broken-down
1112 structure, it needs to be assembled again to get the modified string representation.
1113 
1114 KMime also comes with some codes for handling base64 and quoted-printable encoding, with `KMime::Codec`
1115 as the base class.
1116 
1117 # RFCs # {#rfcs}
1118 
1119 * [RFC 5322](https://tools.ietf.org/html/rfc5322): Internet Message Format
1120 * [RFC 5536](https://tools.ietf.org/html/rfc5536): Netnews Article Format
1121 * [RFC 2045](https://tools.ietf.org/html/rfc2045): Multipurpose Internet Mail Extensions (MIME), Part 1: Format of Internet Message Bodies
1122 * [RFC 2046](https://tools.ietf.org/html/rfc2046): Multipurpose Internet Mail Extensions (MIME), Part 2: Media Types
1123 * [RFC 2047](https://tools.ietf.org/html/rfc2047): Multipurpose Internet Mail Extensions (MIME), Part 3: Message Header Extensions for Non-ASCII Text
1124 * [RFC 2048](https://tools.ietf.org/html/rfc2048): Multipurpose Internet Mail Extensions (MIME), Part 4: Registration Procedures
1125 * [RFC 2049](https://tools.ietf.org/html/rfc2049): Multipurpose Internet Mail Extensions (MIME), Part 5: Conformance Criteria and Examples
1126 * [RFC 2231](https://tools.ietf.org/html/rfc2231): MIME Parameter Value and Encoded Word Extensions: Character Sets, Languages, and Continuations
1127 * [RFC 2183](https://tools.ietf.org/html/rfc2183): Communicating Presentation Information in Internet Message: The Content-Disposition Header Field
1128 * [RFC 2557](https://tools.ietf.org/html/rfc2557): MIME Encapsulation of Aggregate Documents, such as HTML (MHTML)
1129 * [RFC 1847](https://tools.ietf.org/html/rfc1847): Security Multiparts for MIME: Multipart/Signed and Multipart/Encrypted
1130 * [RFC 3851](https://tools.ietf.org/html/rfc3851): S/MIME Version 3 Message Specification
1131 * [RFC 3156](https://tools.ietf.org/html/rfc3156): MIME Security with OpenPGP
1132 * [RFC 2298](https://tools.ietf.org/html/rfc2298): An Extensible Message Format for Message Disposition Notifications
1133 * [RFC 2646](https://tools.ietf.org/html/rfc2646): The Text/Plain Format Parameter (not supported by KMime)
1134 
1135 # Further Reading # {#section}
1136 
1137 * [Wikipedia article on MIME](https://en.wikipedia.org/wiki/MIME)
1138 * [The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)](https://www.joelonsoftware.com/articles/Unicode.html)
1139 * [A tutorial on character code issues](https://www.cs.tut.fi/~jkorpela/chars.html)
1140 * [Online Base64 encoder and decoder](https://www.motobit.com/util/base64-decoder-encoder.asp)
1141 * [Online quoted-printable encoder](https://www.motobit.com/util/quoted-printable-encoder.asp)
1142 * [Onlinw quota reached](https://www.motobit.com/util/quoted-printable-decoder.asp)
1143 * [Online charset converter](https://www.motobit.com/util/charset-codepage-conversion.asp)
1144 * [Wikipedia article on public-key cryptography](https://en.wikipedia.org/wiki/Public-key_cryptography)