Warning, /pim/kmime/README.md is written in an unsupported language. File is not indexed.
0001 # KMime # 0002 0003 [TOC] 0004 0005 # Introduction # {#introduction} 0006 0007 KMime is a library for handling mail messages and newsgroup articles. Both mail messages and 0008 newsgroup articles are based on the same standard called MIME, which stands for 0009 **Multipurpose Internet Mail Extensions**. In this document, the term *message* is used to 0010 refer to both mail messages and newsgroup articles. 0011 0012 KMime deals solely with the in-memory representation of messages. Topics such as transport or storage 0013 of messages are handled by other libraries, for example by [the mailtransport library](https://api.kde.org/kdepim/kmailtransport/html/index.html) 0014 or by [the KIMAP library](https://api.kde.org/kdepim/kimap/html/index.html). 0015 Similarly, this library does not deal with displaying messages or advanced composing, for those there 0016 are the messageviewer and messagecomposer 0017 components in the KDE PIM [messagelib](https://api.kde.org/kdepim/messagelib/html/index.html) module. 0018 0019 KMime's main function is to parse, modify and assemble messages in-memory. In a 0020 [later section](@ref string-broken-down), *parsing* and *assembling* are actually explained. 0021 KMime provides high-level classes that make these tasks easy. 0022 0023 MIME is defined by various RFCs, see the [RFC section](@ref rfcs) for a list of them. 0024 0025 # Structure of this document # {#structure} 0026 0027 This document will first give an [introduction to the MIME specification](@ref mime-intro), as it is 0028 essential to understand the basics of the structure of MIME messages for using this library. 0029 The introduction here is aimed at users of the library. It gives a broad overview with examples and 0030 omits some details. Developers who wish to modify KMime should read the 0031 [corresponding RFCs](@ref rfcs) as well, but this is not necessary for library users. 0032 0033 After the introduction to the MIME format, the two ways of representing a message in memory are 0034 discussed, the [string representation and the broken down representation](@ref string-broken-down). 0035 0036 This is followed by a section giving an 0037 [overview of the most important KMime classes](@ref classes-overview). 0038 0039 The last sections give a list of [relevant RFCs](@ref rfcs) and provide links for 0040 [further reading](@ref links). 0041 0042 # Structure of MIME messages # {#mime-intro} 0043 0044 ## A brief history of the MIME standard ## {#history} 0045 0046 The MIME standard is quite new (1993), email and usenet existed way before the MIME standard came into 0047 existence. Because of this, the MIME standard has to keep backwards compatibility. The email 0048 standard before MIME lacked many capabilities, like encodings other than ASCII, or attachments. These 0049 and other things were later added by MIME. The standard for messages before MIME is defined in 0050 [RFC 5233](https://tools.ietf.org/html/rfc5322). In [RFC 2045](https://tools.ietf.org/html/rfc2045) 0051 to [RFC 2049](https://tools.ietf.org/html/rfc2049), several backward-compatible extensions 0052 to the basic message format are defined, adding support for attachments, different encodings and many 0053 others. 0054 0055 Actually, there is an even older standard, defined in [RFC 733](https://tools.ietf.org/html/rfc733) 0056 (*Standard for the format of ARPA network text messages*, introduced in 1977). 0057 This standard is now obsoleted by RFC 5322, but backwards compatibility is in some cases supported, as 0058 there are still messages in this format around. 0059 0060 Since pre-MIME messages had no way to handle attachments, attachments were sometimes added to the message 0061 text in an [uuencoded](https://en.wikipedia.org/wiki/Uuencoding) form. Although this is also 0062 obsolete, reading uuencoded attachments is still supported by KMime. 0063 0064 After MIME was introduced, people realized that there was no way to have the filename of attachments 0065 encoded in anything other than ASCII. Thus, [RFC 2231](https://tools.ietf.org/html/rfc2231) 0066 was introduced to allow arbitrary encodings for parameter values, such as the attachment filename. 0067 0068 ## MIME by examples ## {#examples} 0069 0070 In the following sections, MIME message examples are shown, examined and explained, starting with 0071 a simple message and proceeding to more interesting examples. 0072 You can get additional examples by simply viewing the source of your own messages in your mail client, 0073 or by having a look at the examples in the [various RFCs](@ref rfcs). 0074 0075 ### A simple message ### {#simple-email} 0076 0077 Subject: First Mail 0078 From: John Doe <john.doe@domain.com> 0079 Date: Sun, 21 Feb 2010 19:16:11 +0100 0080 MIME-Version: 1.0 0081 0082 Hello World! 0083 0084 The above example features a very simple message. The two main parts of this message are the **header** 0085 and the **body**, which are separated by an empty line. The body contains the actual message content, 0086 and the header contains metadata about the message itself. The header consists of several **header fields**, 0087 each of them in their own line. Header fields are made up from the **header field name**, followed by a colon, followed 0088 by the **header field body**. 0089 0090 The **MIME-Version** header field is mandatory for MIME messages. **Subject**, 0091 **From** and **Date** are important header fields; they are usually displayed in the message list of a 0092 mail client. The `Subject` header field can be anything, it does not have a special structure. It is a 0093 so-called **unstructured** header field. In contrast, the `From` and the `Date` header fields have 0094 to follow a special structure, they must be formed in a way that machines can parse. They are **structured** 0095 header fields. For example, a mail client needs to understand 0096 the `Date` header field so that it can sort the messages by date in the message list. 0097 The exact details of how the header field bodies of structured header fields should be 0098 formed are specified in an RFC. 0099 0100 In this example, the `From` header contains a single email address. More precisely, a single email address is called 0101 a **mailbox**, which is made up of the **display name** (John Doe) and the **address specification** (john.doe@domain.com), 0102 which is enclosed in angle brackets. The `addr-spec` consists of the user name, the **local part**, 0103 and the **domain** name. 0104 0105 Many header fields can contain multiple email addresses, for example the `To` field for messages with 0106 multiple recipients can have a comma-separated list of mailboxes. 0107 A list of mailboxes, together with a display name for the list, forms a **group**, and multiple groups can form an 0108 **address list**. This is however rarely used, you'll most often see a simple list of plain mailboxes. 0109 0110 There are many more possible header fields than shown in this example, and the header can even contain 0111 arbitrary header fields, which usually are prefixed with `X-`, like `X-Face`. 0112 0113 ### Encodings and charsets ### {#encodings} 0114 0115 From: John Doe <john.doe@domain.com> 0116 Date: Mon, 22 Feb 2010 00:42:45 +0100 0117 MIME-Version: 1.0 0118 Content-Type: Text/Plain; 0119 charset="iso-8859-1" 0120 Content-Transfer-Encoding: quoted-printable 0121 0122 Gr=FCezi Welt! 0123 0124 The above shows a message that is using a different **charset** than the standard **US-ASCII** charset. The 0125 message body contains the string "Grüezi Welt!", which is **encoded** in a special way. 0126 0127 The **content-type** of this message is **text/plain**, which means that the message is simple text. Later, 0128 other content types will be introduced, such as **text/html**. If there is no `Content-Type` header 0129 field, it is assumed that the content-type is `text/plain`. 0130 0131 Before MIME was introduced, all messages were limited to the US-ASCII charset. Only the 0132 lower 127 values of the bytes were allowed to be used, the so-called **7-bit** range. Writing a message in 0133 another charset or using letters from the upper 127 byte values was not allowed. 0134 0135 #### Charset Encoding #### 0136 0137 When talking about charsets, it is important to understand how strings of text are converted to 0138 byte arrays, and the other way around. A message is nothing else than a big array of bytes. 0139 The bytes that form the body of the message somehow need to be interpreted as a text string. Interpreting 0140 a byte array as a text string is called **decoding** the text. Converting a text string to a byte array is called 0141 **encoding** the text. A **codec** (**co**der-**dec**oder) is a utility that can encode and decode text. 0142 In Qt, the class for text strings is QString, and the class for byte arrays is QByteArray. 0143 0144 With the US-ASCII charset, encoding and decoding text is easy, one just has to look at an [ASCII table](https://en.wikipedia.org/wiki/ASCII_table) 0145 to be able to convert text strings to byte arrays and byte arrays to text strings. For 0146 example, the letter 'A' is represented by a single byte with the value of 65. When encountering a byte 0147 with the value 84, we can look that up in the table and see that it represents the letter 'T'. 0148 With the US-ASCII charset, each letter is represented by exactly one byte, which is very convenient. 0149 Even better, all letters commonly used in English text have byte values below 127, so the 7-bit limit 0150 of messages is no problem for text encoded with the US-ASCII charset. 0151 Another example: The string "Hello World!" is represented by the following byte array: 0152 0153 48 65 6C 6C 6F 20 57 6F 72 6C 64 0154 0155 Note that the byte values are written in hexadecimal form here, not in decimal as earlier. 0156 0157 Now, what if we want to write a message that contains German umlauts or Chinese letters? Those 0158 are not in the ASCII table, therefore a different charset has to be used. There is a wealth of charsets 0159 to choose from. Not all charsets can handle all letters, for example the 0160 [ISO-8859-1](https://en.wikipedia.org/wiki/ISO-8859-1#ISO-8859-1) charset can handle 0161 German umlauts, but cannot handle Chinese or Arabic letters. The [Unicode standard](https://en.wikipedia.org/wiki/Unicode) 0162 is an attempt to introduce charsets that can handle all known letters in the 0163 world, in all languages. Unicode actually has several charsets, for example [UTF-8](https://en.wikipedia.org/wiki/UTF-8) 0164 and [UTF-16](https://en.wikipedia.org/wiki/UTF-16). In an ideal world, everyone would be using 0165 Unicode charsets, but for historic and legacy reasons, other charsets are still much in use. 0166 0167 Charsets other than US-ASCII don't generally have as nice properties: A single letter can be represented 0168 by multiple bytes, and generally the byte values are not in the 7-bit range. Pay attention to the UTF-8 0169 charset: At first glance, it looks exactly like the US-ASCII charset, common latin letters like A - Z 0170 are encoded with the same byte values as with US-ASCII. However, letters other than A - Z are suddenly 0171 encoded with two or even more bytes. In general, one letter can be encoded in an abitrary number of bytes, depending 0172 on the charset. One can **not** rely on the `1 letter == 1 byte` assumption. 0173 0174 Now, what should be done when the text string "Grüezi Welt!" should be sent in the body of a message? 0175 The first step is to choose a charset that can represent all of its letters. This already excludes US-ASCII. 0176 Once a charset is chosen, the text string is encoded into a byte array. 0177 "Grüezi Welt!" encoded with the ISO-8859-1 charset produces the following byte array: 0178 0179 47 72 FC 65 7A 69 20 57 65 6C 74 21 0180 0181 The letter 'ü' here is encoded using a single byte with the value `FC`. 0182 The same string encoded with UTF-8 looks slightly different: 0183 0184 47 72 C3 BC 65 7A 69 20 57 65 6C 74 21 0185 0186 Here, the letter 'ü' is encoded with two bytes, `C3 BC`. Still, one can see the similarity 0187 between the two charsets for the other letters. 0188 0189 You can try this out yourself: Open your favorite text editor and enter some text with non-latin 0190 letters. Then save the file and view it in a hex editor to see how the text was converted to a 0191 byte array. Make sure to try out setting different charsets in your text editor. 0192 0193 At this point, the text string is successfully converted to a byte array, using e.g. the ISO-8859-1 0194 charset. To indicate which charset was used, a **Content-Type** header field has to be added, with the correct 0195 **charset** parameter. In our example above, that was done. If the charset parameter of the `Content-Type`, 0196 or even the complete `Content-Type` header field is left out, the receiver can not know how to interpret 0197 the byte array! In these cases, the byte array is usually decoded incorrectly, and the text strings contain 0198 wrong letters or lots of question marks. There is even a special term for such wrongly decoded text, 0199 [Mojibake](https://en.wikipedia.org/wiki/Mojibake). It is important to always know what charset 0200 your byte array is encoded with, otherwise an attempt at decoding the byte array into a text string will fail and produce 0201 Mojibake. **There is no such thing as plain text!** If there is no `Content-Type` header field in 0202 a message, the message body should be interpreted as US-ASCII. 0203 0204 To learn more about charsets and encodings, read 0205 [The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)](https://www.joelonsoftware.com/articles/Unicode.html) 0206 and [A tutorial on character code issues](https://www.cs.tut.fi/~jkorpela/chars.html). Especially 0207 the first article should really be read, as the name indicates. 0208 0209 #### Content Transfer Encoding #### 0210 0211 Now, we can't use the byte array that was just created in a message. The string encoded with ISO-8859-1 0212 has the byte value `FC` for the letter 'ü', which is decimal value 252. However, as said earlier, 0213 messages are only valid when all bytes are in the 7-bit range, i.e. have byte value below 127. 0214 So what should we do for byte values that are greater than 127, how can they be added to messages? The solution 0215 for this is to use a **content transfer encoding** (CTE). A content transfer encoding takes a byte 0216 array as input and transforms it. The output is another byte array, but one which only uses byte values 0217 in the 7-bit range. One such content transfer encoding is **quoted-printable** (QTP), which is used in the 0218 above example. Quoted-printable is easy to understand: When encountering a byte that has a value greater 0219 than 127, it is simply replaced by a '=', followed by the hexadecimal code of the byte value, represented 0220 as letters and digits encoded with ASCII. This means 0221 that a byte with the value 252 is replaced with the ASCII string `=FC`, since `FC` 0222 is the hexadecimal value of 252. The ASCII string `=FC` itself is now three bytes big, 0223 `3D 46 43`. Therefore, the quoted-printable encoding replaces each byte outside of the 7-bit 0224 range with 3 new bytes. Decoding quoted-printable encoding is also easy: Each time a byte with the value 0225 `3D`, which is the letter '=' in ASCII, is encountered, the next two following bytes are interpreted 0226 as the hex value of the resulting byte. The quoted-printable encoding was invented to make reading the 0227 byte array easy for humans. 0228 0229 The quoted-printable encoding is not a good choice when the input byte array contains lots of bytes 0230 outside the 7-bit range, as the resulting byte array will be three times as big in the worst case, 0231 which is a waste of space. Therefore another content transfer encoding was introduced, **Base64**. 0232 The details of the base64 encoding are too much to write about here; refer to the 0233 [Wikipedia article](https://en.wikipedia.org/wiki/Base64) or the [RFC](https://tools.ietf.org/html/rfc2045#section-6.8) 0234 for details. As an example, the ISO-8859-1 encoded text string "Grüezi Welt!" is, after encoding it with base64, 0235 represented by the following ASCII string: `R3L8ZXppIFdlbHQh`. 0236 To express the same in byte arrays: The byte array `47 72 FC 65 7A 69 20 57 65 6C 74 21` 0237 is, after encoding it with base64, 0238 represented by the byte array `52 33 4C 38 5A 58 70 70 49 46 64 6C 62 48 51 68`. 0239 0240 There are two other content transfer encodings besides quoted printable and base64: **7-bit** and 0241 **8-bit**. 7-bit is just a marker to indicate that no content transfer encoding is used. This is the 0242 case when the byte array is already completely in the 7-bit range, for example when writing English 0243 text using the US-ASCII charset. 8-bit is also a marker to indicate that no content transfer encoding 0244 was used. This time, not because it was not necessary, but because of a special exception, byte values 0245 outside of the 7-bit range are allowed. For example, some SMTP servers support the 0246 [8BITMIME](https://tools.ietf.org/html/rfc1652) extension, which indicates that they accept 0247 bytes outside of the 7-bit range. In this case, one can simply use the byte arrays as-is, without using 0248 any content transfer encoding. Creating messages with 8-bit content transfer encoding is currently not 0249 supported by KMime. The advantage of 8-bit is that there is no overhead in size, unlike with 0250 base64 or even quoted-printable. 0251 0252 When using one of the 4 contents transfer encodings, i.e. quoted-printable, base64, 7-bit or 8-bit, this 0253 has to be indicated in the header field **Content-Transfer-Encoding**. If the header field is left out, 0254 it is assumed that the content transfer encoding is 7-bit. The example above uses quoted-printable. 0255 0256 From: John Doe <john.doe@domain.com> 0257 Date: Mon, 22 Feb 2010 00:42:45 +0100 0258 MIME-Version: 1.0 0259 Content-Type: Text/Plain; 0260 charset="iso-8859-1" 0261 Content-Transfer-Encoding: base64 0262 0263 R3L8ZXppIFdlbHQh 0264 0265 The same example, this time encoded with the base64 content transfer encoding. 0266 0267 From: John Doe <john.doe@domain.com> 0268 Date: Mon, 22 Feb 2010 00:42:45 +0100 0269 MIME-Version: 1.0 0270 Content-Type: Text/Plain; 0271 charset="utf-8" 0272 Content-Transfer-Encoding: base64 0273 0274 R3LDvGV6aSBXZWx0IQ== 0275 0276 Again the same example, this time using UTF-8 as the charset. 0277 0278 From: John Doe <john.doe@domain.com> 0279 Date: Mon, 22 Feb 2010 00:42:45 +0100 0280 MIME-Version: 1.0 0281 Content-Type: Text/Plain; 0282 charset="utf-8" 0283 Content-Transfer-Encoding: quoted-printable 0284 0285 Gr=C3=BCezi Welt! 0286 0287 The example with a combination of UTF-8 and quoted-printable CTE. As said somewhere above, with the 0288 UTF-8 encoding, the letter 'ü' is represented by the two bytes `C3 BC`. 0289 0290 From: John Doe <john.doe@domain.com> 0291 Date: Mon, 22 Feb 2010 00:42:45 +0100 0292 MIME-Version: 1.0 0293 Content-Type: Text/Plain; 0294 charset="utf-8" 0295 Content-Transfer-Encoding: 7-bit 0296 0297 Hello World 0298 0299 A different example, showing 7-bit content transfer encoding. Although the UTF-8 charset has lots 0300 of letters that are represented by bytes outside of the 7-bit range, the string "Hello World" can 0301 be fully represented in the 7-bit range here, even with UTF-8. 0302 0303 In the [further reading](@ref links) section, you will find links to web applications that demonstrate 0304 encodings and charsets. 0305 0306 #### Conclusion #### 0307 0308 When adding a text string to the body of a message, it needs to be encoded twice: First, the encoding of the charset 0309 needs to be applied, which transforms the text string into a byte array. Afterwards, the content transfer 0310 encoding has to be applied, which transforms the byte array from the first step into a byte array that 0311 only has bytes in the 7-bit range. 0312 0313 When decoding, the same has to be done, in reverse: One first has decode the byte array with the content transfer encoding, to get a byte 0314 array that has all 256 possible byte values. Afterwards, the resulting byte array needs to be decoded 0315 with the correct charset, to transform it into a text string. For those two decoding steps, one has to 0316 look at the `Content-Type` and the `Content-Transfer-Encoding` header fields to find the correct 0317 charset and CTE for decoding. 0318 0319 It is important to always keep the charset and the content transfer encoding in mind. Byte arrays and 0320 strings are not to be confused. Byte arrays that are encoded with a CTE are not to be confused with 0321 byte arrays that are **not** encoded with a CTE. 0322 0323 This section showed how to use different charsets in the *body* of a message. The next section will 0324 show what to do when another charset is needed in one of the *header* field bodies. 0325 0326 ### Encoding in Header Fields ### {#header-encoding} 0327 0328 In the last section, we discussed how to use different charsets in the body of a message. But what if 0329 a different charset needs to be added to one of the header fields? For example one might want to write 0330 a mail to a mailbox with the display name "András Manţia" and with the subject "Grüezi!". 0331 0332 The header fields are limited to characters in the 7-bit range, and are interpreted as US-ASCII. 0333 That means the header field names, such as "From: ", are all encoded in US-ASCII. The header field 0334 bodies, such as the "1.0" of `MIME-Version`, are also encoded with US-ASCII. This is mandated by 0335 [the RFC](https://tools.ietf.org/html/rfc5322#section-2). 0336 0337 The `Content-Type` and the `Content-Transfer-Encoding` header fields only apply to the message body, 0338 they have no meaning for other header fields. 0339 0340 This means that any letter in a different charset has to be encoded in some way to satisfy the RFC. 0341 Letters with a different charset are only allowed in some of the header field bodies; the header field 0342 names always have to be in US-ASCII. 0343 0344 From: Thomas McGuire <thomas@domain.com> 0345 Subject: =?iso-8859-1?q?Gr=FCezi!?= 0346 Date: Mon, 22 Feb 2010 14:34:01 +0100 0347 MIME-Version: 1.0 0348 To: =?utf-8?q?Andr=C3=A1s?= =?utf-8?q?_Man=C5=A3ia?= <andras@domain.com> 0349 Content-Type: Text/Plain; 0350 charset="us-ascii" 0351 Content-Transfer-Encoding: 7bit 0352 0353 bla bla bla 0354 0355 The above example shows how text that is encoded with a different charset than US-ASCII is handled 0356 in the message header. This can be seen in the bodies of the `Subject` header field and the `To` header field. 0357 In this example, the body of the message is unimportant, it is just "bla bla bla" in US-ASCII. 0358 The way the header field bodies are encoded is sometimes referred to as a **RFC2047 string** or as an **encoded word**, which has 0359 its origin in the [RFC](https://tools.ietf.org/html/rfc2047) where this encoding scheme is defined. 0360 RFC2047 strings are only allowed in some of the header fields, like `Subject`, and in the display name 0361 of mailboxes in header fields like `From` and `To`. In other header fields, such as `Date` and 0362 `MIME-Version`, they are not allowed, but they wouldn't make much sense there anyway, since those are 0363 structured header fields with a clearly defined structure. 0364 0365 RFC2047 strings start with "=?" and end with "?=". Between those markers, they consists of three parts: 0366 * The charset, such as "iso-8859-1" 0367 * The encoding, which is "q" or "b" 0368 * The encoded text 0369 0370 These three parts are separated with a '?'. Encoding the third part, the text, is very similar to how 0371 text strings in the message body are encoded: First, the text string is encoded to a byte array using 0372 the charset encoding. Afterwards, the second encoding is used on the result, to ensure that all resulting 0373 bytes are within the 7-bit range. 0374 0375 The *second encoding* here is almost identical to the content transfer encoding. There are two 0376 possible encodings, **b** and **q**. The `b` encoding is the same as the base64 encoding of the content 0377 transfer encoding. The `q` encoding is very similar to the quoted-printable encoding of the content 0378 transfer encoding, but with some little differences that are described in 0379 [the RFC](https://tools.ietf.org/html/rfc2047#section-4.2). 0380 0381 Let's examine the subject of the message, `=?iso-8859-1?q?Gr=FCezi!?=`, in detail: 0382 0383 The first part of the RFC2027 string is the charset, so it is ISO-8859-1 in this case. The second part 0384 is the encoding, which is the `q` encoding here. The last part is the encoded text, which is 0385 `Gr=FCezi!`. As with the quoted-printable encoding, "=FC" is the encoding for the byte with 0386 the value `FC`, which in the ISO-8859-1 charset is the letter 'ü'. The complete decoded 0387 text is therefore "Grüezi!". 0388 0389 Each RFC2047 string in the header can use a different charset: In this example, the `Subject` uses ISO-8859-1, 0390 `To` uses UTF-8 and the message body uses US-ASCII. 0391 0392 In the `To` header field, two RFC2047 strings are used. A single, bigger, RFC2047 string for the whole 0393 display name could also have been used. In this case, the second RFC2047 string starts with an underscore, 0394 which is decoded as a space in the `q` encoding. The space between the two RFC2047 strings is ignored, 0395 it is just used to separate the two encoded words. 0396 0397 There are some restriction on RFC2047 strings: They are not allowed to be longer than 75 characters, 0398 which means two or more encoded words have to be used for long text strings. Also, there are some 0399 restrictions on where RFC2047 strings are allowed; most importantly, the address specification must 0400 not be encoded, to be backwards compatible. For further details, refer to the RFC. 0401 0402 ### Messages with attachments ### {#multipart-mixed} 0403 0404 Until now, we only looked at messages that had a single text part as the message body. In this section, 0405 we'll examine messages with attachments. 0406 0407 From: frank@domain.com 0408 To: greg@domain.com 0409 Subject: Nice Photo 0410 Date: Sun, 28 Feb 2010 19:57:00 +0100 0411 MIME-Version: 1.0 0412 Content-Type: Multipart/Mixed; 0413 boundary="Boundary-00=_8xriL5W6LSj00Ly" 0414 0415 --Boundary-00=_8xriL5W6LSj00Ly 0416 Content-Type: Text/Plain; 0417 charset="us-ascii" 0418 Content-Transfer-Encoding: 7bit 0419 0420 Hi Greg, 0421 0422 attached you'll find a nice photo. 0423 0424 --Boundary-00=_8xriL5W6LSj00Ly 0425 Content-Type: image/jpeg; 0426 name="test.jpeg" 0427 Content-Transfer-Encoding: base64 0428 Content-Disposition: attachment; 0429 filename="test.jpeg" 0430 0431 /9j/4AAQSkZJRgABAQAAAQABAAD/4Q3XRXhpZgAASUkqAAgAAAAHAAsAAgAPAAAAYgAAAAABBAAB 0432 [SNIP 800 lines] 0433 ze5CdSH2Z8yTatHSV2veW0rKzeq30//Z 0434 0435 --Boundary-00=_8xriL5W6LSj00Ly-- 0436 0437 *Note: Since the image in this message would be really big, most of it is omitted / snipped here.* 0438 0439 The above example consists of two parts: A normal text part and an image attachment. Messages that 0440 consist of multiple parts are called **multipart** messages. The top-level content-type therefore is 0441 **multipart/mixed**. `Mixed` simply means that the following parts have no relation to each other, 0442 it is just a random mixture of parts. Later, we will look at other types, such as `multipart/alternative` 0443 or `multipart/related`. A **part** is sometimes also called **node**, **content** or **MIME part**. 0444 0445 Each MIME part of the message is separated by a **boundary**, and that boundary 0446 is specified in the top-level content-type header as a parameter. In the message body, the boundary 0447 is prefixed with `"--"`, and the last boundary is suffixed with `"--"`, so that the end of the message can 0448 be detected. When creating a message, care must be taken that the boundary appears nowhere else in the 0449 message, for example in the text part, as the parser would get confused by this. 0450 0451 A MIME part begins right after the boundary. It consists of a **MIME header** and a **MIME body**, which 0452 are separated by an empty line. The MIME header should not be confused with the message header: The 0453 message header contains metadata about the whole message, like subject and date. The MIME header only 0454 contains metadata about the specific MIME part, like the content type of the MIME part. MIME header 0455 field names always start with `"Content-"`. 0456 The example above shows the three most important MIME header fields. Usually those are the only ones 0457 used. The top-level header of a message actually mixes the message metadata and the MIME metadata into one header: In this 0458 example, the header contains the `Date` header field, which is an ordinary header field, and it contains 0459 the `Content-Type` header field, which is a MIME header field. 0460 0461 MIME parts can be nested, and therefore form a tree. The above example has the following tree: 0462 0463 multipart/mixed 0464 |- text/plain 0465 \- image/jpeg 0466 0467 The `text/plain` node is therefore a `child` of the `multipart/mixed` node. The `multipart/mixed` node 0468 is a `parent` of the other two nodes. The `image/jpeg` node is a **sibling** of the `text/plain` node. 0469 `Multipart` nodes are the only nodes that have children, other nodes are **leaf** nodes. 0470 The body of a multipart node consists of all complete child nodes (MIME header and MIME body), separated 0471 by the boundary. 0472 0473 Each MIME part can have a different content transfer encoding. In the above example, the text part has 0474 a `7bit` CTE, while the image part has a `base64` CTE. The multipart/mixed node does not specify 0475 a CTE, multipart nodes always have `7bit` as the CTE. This is because the body of multipart nodes can 0476 only consist of bytes in the 7 bit range: The boundary is 7 bit, the MIME headers are 7 bit, and the 0477 MIME bodies are already encoded with the CTE of the child MIME part, and are therefore also 7 bit. This means 0478 no CTE for multipart nodes is necessary. 0479 0480 The MIME part for the image does not specify a charset parameter in the content type header field. This 0481 is because the body of that MIME part will not be interpreted as a text string, therefore the byte array 0482 does not need to be decoded to a string. Instead, the byte array is interpreted as an image, by an image 0483 renderer. The message viewer application passes the MIME part body as a byte array to the image renderer. 0484 The content type consists of a **media type** and a **subtype**. For example, the content type 0485 `"text/html"` has the media type "text" and the subtype "html". Only nodes that have the media type "text" 0486 need to specify a charset, as those nodes are the only nodes of which the body is interpreted as a text string. 0487 0488 The only header field not yet encountered in previous sections is the **Content-Disposition** header field, 0489 which is defined in a [separate RFC](https://tools.ietf.org/html/rfc2183). It describes how 0490 the message viewer application should display the MIME part. In the case of the image part, is should 0491 be presented as an attachment. The **filename** parameter tells the message viewer application which filename 0492 should be used by default when the user saves the attachment to disk. 0493 0494 The content type header field for the image MIME part has a **name** parameter, which is similar to the 0495 `filename` parameter of the `Content-Disposition` header field. The difference is that `name` refers 0496 to the name of the complete MIME part, whereas `filename` refers to the name of the attachment. The 0497 `name` parameter of the `Content-Type` header field in this case is superfluous and only exists for 0498 backwards compatibility, and can be ignored; 0499 the `filename` parameter of the `Content-Disposition` header field should be preferred when it is present. 0500 0501 From: Thomas McGuire <thomas@domain.com> 0502 To: sebastian@domain.com 0503 Subject: Help with SPARQL 0504 Date: Sun, 28 Feb 2010 21:57:51 +0100 0505 MIME-Version: 1.0 0506 Content-Type: Multipart/Mixed; 0507 boundary="Boundary-00=_PjtiLU2PvHpvp/R" 0508 0509 --Boundary-00=_PjtiLU2PvHpvp/R 0510 Content-Type: Text/Plain; 0511 charset="us-ascii" 0512 Content-Transfer-Encoding: 7bit 0513 0514 Hi Sebastian, 0515 0516 I have a problem with a SPARQL query, can you help me debug this? Attached is 0517 the query and a screenshot showing the result. 0518 0519 --Boundary-00=_PjtiLU2PvHpvp/R 0520 Content-Type: text/plain; 0521 charset="UTF-8"; 0522 name="query.txt" 0523 Content-Transfer-Encoding: 7bit 0524 Content-Disposition: attachment; 0525 filename="query.txt" 0526 0527 prefix nco:<http://www.semanticdesktop.org/ontologies/2007/03/22/nco#> 0528 0529 SELECT ?person 0530 WHERE { 0531 ?person a nco:PersonContact . 0532 ?person nco:birthDate ?birthDate . 0533 }" 0534 --Boundary-00=_PjtiLU2PvHpvp/R 0535 Content-Type: image/png; 0536 name="screenshot.png" 0537 Content-Transfer-Encoding: base64 0538 Content-Disposition: attachment; 0539 filename="screenshot.png" 0540 0541 AAAAyAAAAAEBBAABAAAAyAAAAA0BAgATAAAAcQAAABIBAwABAAAAAQAAADEBAgAPAAAAhAAAAGmH 0542 [SNIP] 0543 YXJlLmpwZWcAZGlnaUthbS0w 0544 0545 --Boundary-00=_PjtiLU2PvHpvp/R-- 0546 0547 The above example message consists of three MIME parts: The main text part and two attachments. 0548 One attachment has the media type `text`, therefore a charset parameter is necessary to correctly 0549 display it. The MIME tree looks like this: 0550 0551 multipart/mixed 0552 |- text/plain 0553 |- text/plain 0554 \- image/jpeg 0555 0556 ### HTML Messages ### {#multipart-alternative} 0557 0558 From: Thomas McGuire <thomas@domain.com> 0559 Subject: HTML test 0560 Date: Thu, 4 Mar 2010 13:59:18 +0100 0561 MIME-Version: 1.0 0562 Content-Type: multipart/alternative; 0563 boundary="Boundary-01=_m66jLd2/vZrH5oe" 0564 Content-Transfer-Encoding: 7bit 0565 0566 --Boundary-01=_m66jLd2/vZrH5oe 0567 Content-Type: text/plain; 0568 charset="us-ascii" 0569 Content-Transfer-Encoding: 7bit 0570 0571 Hello World 0572 0573 --Boundary-01=_m66jLd2/vZrH5oe 0574 Content-Type: text/html; 0575 charset="us-ascii" 0576 Content-Transfer-Encoding: 7bit 0577 0578 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0//EN" "http://www.w3.org/TR/REC-html40/strict.dtd"> 0579 <html> 0580 <head></head> 0581 <body> 0582 Hello <b>World</b> 0583 </body> 0584 </html> 0585 --Boundary-01=_m66jLd2/vZrH5oe-- 0586 0587 The above example is a simple HTML message. It consists of a plain text and a HTML part, which are 0588 in a **multipart/alternative** container. The message has the following structure: 0589 0590 multipart/alternative 0591 |- text/plain 0592 \- text/html 0593 0594 The HTML part and the plain text part have the identical content, except that the HTML part contains 0595 additional markup, in this case for displaying the word `World` in bold. Since those parts are in a 0596 multipart/alternative container, the message viewer application can freely choose which part it displays. 0597 Some users might prefer reading the message in HTML format, some might prefer reading the message 0598 in plain text format. 0599 0600 Of course, a HTML message could also consist only of a single `text/html`, without the multipart/alternative 0601 container and therefore without an alternative plain text part. However, people preferring the plain 0602 text version wouldn't like this, especially if their mail client has no HTML engine and they would see 0603 the HTML source including all tags only. Therefore, HTML messages should always include an alternative plain text part. 0604 0605 HTML messages can of course also contain attachments. In this case, the message contains both a 0606 multipart/alternative and a multipart/mixed node, for example with the following structure, for a HTML 0607 message that has an image attachment: 0608 0609 multipart/mixed 0610 |- multipart/alternative 0611 | |- text/plain 0612 | \- text/html 0613 \- image/png 0614 0615 The message itself would look like this: 0616 0617 From: Thomas McGuire <thomas@domain.com> 0618 Subject: HTML message with an attachment 0619 Date: Thu, 4 Mar 2010 15:20:26 +0100 0620 MIME-Version: 1.0 0621 Content-Type: Multipart/Mixed; 0622 boundary="Boundary-00=_qG8jLwWCwkUfJV1" 0623 0624 --Boundary-00=_qG8jLwWCwkUfJV1 0625 Content-Type: multipart/alternative; 0626 boundary="Boundary-01=_qG8jLfs1FRmlOhl" 0627 Content-Transfer-Encoding: 7bit 0628 0629 --Boundary-01=_qG8jLfs1FRmlOhl 0630 Content-Type: text/plain; 0631 charset="us-ascii" 0632 Content-Transfer-Encoding: 7bit 0633 0634 Hello World 0635 0636 --Boundary-01=_qG8jLfs1FRmlOhl 0637 Content-Type: text/html; 0638 charset="us-ascii" 0639 Content-Transfer-Encoding: 7bit 0640 0641 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0//EN" "http://www.w3.org/TR/REC-html40/strict.dtd"> 0642 <html> 0643 <head></head> 0644 <body> 0645 Hello <b>World</b> 0646 </body> 0647 </html> 0648 --Boundary-01=_qG8jLfs1FRmlOhl-- 0649 0650 --Boundary-00=_qG8jLwWCwkUfJV1 0651 Content-Type: image/png; 0652 name="test.png" 0653 Content-Transfer-Encoding: base64 0654 Content-Disposition: attachment; 0655 filename="test.png" 0656 0657 iVBORw0KGgoAAAANSUhEUgAAABAAAAAQCAYAAAAf8/9hAAAACXBIWXMAAA8SAAAPEgEhm/IzAAAC 0658 [SNIP] 0659 eFkXsFgBMG4fJhYlx+iyB3cLpNZwYr/iP7teTwNYa7DZAAAAAElFTkSuQmCC 0660 0661 --Boundary-00=_qG8jLwWCwkUfJV1-- 0662 0663 ### HTML Messages with Inline Images ### {#multipart-related} 0664 0665 HTML has support for showing images, with the `img` tag. Such an image is shown at the place where 0666 the `img` tag occurs, which is called an **inline image**. Note that inline images are different 0667 from images that are just normal attachments: Normal attachments are always shown at the beginning or 0668 at the end of the message, while inline images are shown in-place. In HTML, the `img` tag points to an 0669 image file that is either a file on disk or a URL of an image on the Internet. To make inline images 0670 work with MIME messages, a different mechanism is needed, since the image is not a file on disk or on 0671 the Internet, but a MIME part somewhere in the same message. As specified in 0672 [RFC 2557](https://tools.ietf.org/html/rfc2557), the way this can be done is by referring 0673 to a **Content-ID** in the `img` tag, and marking the MIME part that is the image with that content 0674 ID as well. 0675 0676 An example will probably be more clear than this explanation: 0677 0678 From: Thomas McGuire <thomas@domain.com> 0679 Subject: Inine Image Test 0680 Date: Thu, 4 Mar 2010 16:54:53 +0100 0681 MIME-Version: 1.0 0682 Content-Type: multipart/related; 0683 boundary="Boundary-02=_Nf9jLpJ2aGp5RQK" 0684 Content-Transfer-Encoding: 7bit 0685 0686 --Boundary-02=_Nf9jLpJ2aGp5RQK 0687 Content-Type: multipart/alternative; 0688 boundary="Boundary-01=_Nf9jLZ6aPhm3WrN" 0689 Content-Transfer-Encoding: 7bit 0690 Content-Disposition: inline 0691 0692 --Boundary-01=_Nf9jLZ6aPhm3WrN 0693 Content-Type: text/plain; 0694 charset="us-ascii" 0695 Content-Transfer-Encoding: 7bit 0696 0697 Text before image 0698 0699 Text after image 0700 0701 --Boundary-01=_Nf9jLZ6aPhm3WrN 0702 Content-Type: text/html; 0703 charset="us-ascii" 0704 Content-Transfer-Encoding: 7bit 0705 0706 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0//EN" "http://www.w3.org/TR/REC-html40/strict.dtd"> 0707 <html> 0708 <head></head> 0709 <body> 0710 Text before image<br> 0711 <img src="cid:547730348@KDE" /><br> 0712 Text after image 0713 </body> 0714 </html> 0715 --Boundary-01=_Nf9jLZ6aPhm3WrN-- 0716 0717 --Boundary-02=_Nf9jLpJ2aGp5RQK 0718 Content-Type: image/png; 0719 name="test.png" 0720 Content-Transfer-Encoding: base64 0721 Content-Id: <547730348@KDE> 0722 0723 iVBORw0KGgoAAAANSUhEUgAAAMgAAADICAIAAAAiOjnJAAAACXBIWXMAAA7EAAAOxAGVKw4bAAAg 0724 [SNIP] 0725 AABJRU5ErkJggg== 0726 --Boundary-02=_Nf9jLpJ2aGp5RQK-- 0727 0728 The first thing you'll notice in this example probably is that it has a **multipart/related** node with 0729 the following structure: 0730 0731 multipart/related 0732 |- multipart/alternative 0733 | |- text/plain 0734 | \- text/html 0735 \- image/png 0736 0737 When the HTML part has inline image, the HTML part and its image part both have to be children of a 0738 multipart/related container, like in this example. 0739 In this case, the `img` tag has the source `cid:547730348@KDE`, which is a placeholder that refers 0740 to the Content-Id header of another part. The image part contains exactly that value in its `Content-Id` 0741 header, and therefore a message viewer application can connect both. 0742 0743 The plain text part cannot have inline images, therefore its text might seem a bit confusing. 0744 0745 HTML messages with inline images can of course also have attachments, in which the message structure 0746 becomes a mix of multipart/related, multipart/alternative and multipart/mixed. The following example 0747 shows the structure of a message with two inline images and one `.tar.gz` attachment: 0748 0749 multipart/mixed 0750 |- multipart/related 0751 | |- multipart/alternative 0752 | | |- text/plain 0753 | | \- text/html 0754 | |- image/png 0755 | \- image/png 0756 \- application/x-compressed-tar 0757 0758 The structure of MIME messages can get arbitrarily complex, the above is just one relatively simple example. 0759 The nesting of multipart nodes can get much deeper, there is no restriction on nesting levels. 0760 0761 ### Encapsulated messages ### {#encapsulated} 0762 0763 Encapsulated messages are messages which are attachments to another message. The most common example 0764 is a forwarded mail, like in this example: 0765 0766 From: Frank <frank@domain.com> 0767 To: Bob <bob@domain.com> 0768 Subject: Fwd: Blub 0769 MIME-Version: 1.0 0770 Content-Type: Multipart/Mixed; 0771 boundary="Boundary-00=_sX+jLVPkV1bLFdZ" 0772 0773 --Boundary-00=_sX+jLVPkV1bLFdZ 0774 Content-Type: text/plain; 0775 charset="us-ascii" 0776 Content-Transfer-Encoding: 7bit 0777 0778 Hi Bob, 0779 0780 hereby I forward you an interesting message from Greg. 0781 0782 --Boundary-00=_sX+jLVPkV1bLFdZ 0783 Content-Type: message/rfc822; 0784 name="forwarded message" 0785 Content-Transfer-Encoding: 7bit 0786 Content-Description: Forwarded Message 0787 Content-Disposition: inline 0788 0789 From: Greg <greg@domain.com> 0790 To: Frank <frank@domain.com> 0791 Subject: Blub 0792 MIME-Version: 1.0 0793 Content-Type: Text/Plain; 0794 charset="us-ascii" 0795 Content-Transfer-Encoding: 7bit 0796 0797 Bla Bla Bla 0798 0799 --Boundary-00=_sX+jLVPkV1bLFdZ-- 0800 0801 0802 multipart/mixed 0803 |- text/plain 0804 \- message/rfc822 0805 \- text/plain 0806 0807 The attached message is treated like any other attachment, and therefore the top-level content type 0808 is multipart/mixed. 0809 The most interesting part is the `message/rfc822` MIME part. As usual, it has some MIME headers, like 0810 `Content-Type` or `Content-Disposition`, followed by the MIME body. The MIME body in this case is 0811 the attached message. Since it is a message, it consists of a header and a body itself. 0812 Therefore, the `message/rfc822` MIME part appears to have two headers; in reality, it is the normal 0813 MIME header and the message header of the encapsulated message. The message header and the message body 0814 are both in the MIME body of the `message/rfc822` MIME part. 0815 0816 ### Signed and Encrypted Messages ### {#crypto} 0817 0818 MIME messages can be cryptographically signed and/or encrypted. The format for those messages is 0819 defined in [RFC 1847](https://tools.ietf.org/html/rfc1847), which specifies two new 0820 multipart subtypes, **multipart/signed** and **multipart/encrypted**. The crypto format of these new 0821 security multiparts is defined in additional RFCs; the most common formats are 0822 [OpenPGP](https://tools.ietf.org/html/rfc3156) and [S/MIME](https://tools.ietf.org/html/rfc2633). 0823 Both formats use the principle of [public-key cryptography](https://en.wikipedia.org/wiki/Public-key_cryptography). 0824 OpenPGP uses **keys**, and S/MIME uses **certificates**. For easier text flow, only the term `key` will be used 0825 for both keys and certificates in the text below. 0826 0827 Security multiparts only sign or encrypt a specific MIME part. The consequence is that the message headers 0828 can not be signed or encrypted. Also this means that it is possible to sign or encrypt only some of 0829 the MIME parts of a message, while leaving other MIME parts unsigned or unencrypted. Furthermore, it 0830 is possible to sign or encrypt different MIME parts with different crypto formats. As you can see, 0831 security multiparts are very flexible. 0832 0833 Security multiparts are not supported by KMime. However, it is possible for applications to use KMime 0834 when providing support for crypto messages. For example, the messageviewer 0835 component in KDE PIM's [messagelib](https://api.kde.org/kdepim/messagelib/html/index.html) supports signed and encrypted MIME parts, and the 0836 messagecomposer library can create 0837 such messages. 0838 0839 Signed MIME parts are signed with the private key of the sender, and everybody who has the 0840 public key of the sender can verify the signature. Encrypted MIME parts are encrypted with the public 0841 key of the receiver, and only the receiver, who is the sole person possessing the private key, can decrypt 0842 it. Sending an encrypted message to multiple recipients therefore means that the message has to be sent 0843 multiple times, once for each receiver, as each message needs to be encrypted with a different key. 0844 0845 #### Signed MIME parts #### 0846 0847 A multipart/signed MIME part has exactly two children: The first child is the content that is signed, 0848 and the second child is the signature. 0849 0850 From: Thomas McGuire <thomas@domain.com> 0851 Subject: My Subject 0852 Date: Mon, 15 Mar 2010 12:20:16 +0100 0853 MIME-Version: 1.0 0854 Content-Type: multipart/signed; 0855 boundary="nextPart2567247.O5e8xBmMpa"; 0856 protocol="application/pgp-signature"; 0857 micalg=pgp-sha1 0858 Content-Transfer-Encoding: 7bit 0859 0860 --nextPart2567247.O5e8xBmMpa 0861 Content-Type: Text/Plain; 0862 charset="us-ascii" 0863 Content-Transfer-Encoding: 7bit 0864 0865 Simple message 0866 0867 --nextPart2567247.O5e8xBmMpa 0868 Content-Type: application/pgp-signature; name=signature.asc 0869 Content-Description: This is a digitally signed message part. 0870 0871 -----BEGIN PGP SIGNATURE----- 0872 Version: GnuPG v2.0.14 (GNU/Linux) 0873 0874 iEYEABECAAYFAkueF/UACgkQKglv3sO8a1MdTACgnBEP6ZUal931Vwu7PyiXT1bn 0875 Zr0Anj4bAI9JhHEDiwA/iwrWGfSC+Nlz 0876 =d2ol 0877 -----END PGP SIGNATURE----- 0878 --nextPart2567247.O5e8xBmMpa-- 0879 0880 0881 multipart/signed 0882 |- text/plain 0883 \- application/pgp-signature 0884 0885 The example here uses the OpenPGP format to sign a simply plain text message. Here, the text/plain 0886 MIME part is signed, and the application/pgp-signature MIME part contains the signature data, which in 0887 this case is ASCII-armored. 0888 0889 As said above, it is possible to sign only some MIME parts. A message which has a image/jpeg attachment 0890 that is signed, but a main text part is not signed, has the following MIME structure: 0891 0892 multipart/mixed 0893 |- text/plain 0894 \- multipart/signed 0895 |- image/jpeg 0896 \- application/pgp-signature 0897 0898 It is possible to sign multipart parts as well. Consider the above example that has a plain text part 0899 and an image attachment. Those two parts can be signed together, with the following structure: 0900 0901 multipart/signed 0902 |- multipart/mixed 0903 | |- text/plain 0904 | \- image/jpeg 0905 \- application/pgp-signature 0906 0907 Signed messages in the S/MIME format use a different content type for the signature data, like here: 0908 0909 multipart/signed 0910 |- text/plain 0911 \- application/x-pkcs7-signature 0912 0913 #### Encrypted MIME parts #### 0914 0915 Multipart/encrypted MIME parts also have exactly two children: The first child contains metadata about 0916 the encrypted data, such as a version number. The second child then contains the actual encrypted data. 0917 0918 From: someone@domain.com 0919 To: Thomas McGuire <thomas@domain.com> 0920 Subject: Encrypted message 0921 Date: Mon, 15 Mar 2010 12:50:16 +0100 0922 MIME-Version: 1.0 0923 Content-Type: multipart/encrypted; 0924 boundary="nextPart2726747.j47xUGTWKg"; 0925 protocol="application/pgp-encrypted" 0926 Content-Transfer-Encoding: 7bit 0927 0928 --nextPart2726747.j47xUGTWKg 0929 Content-Type: application/pgp-encrypted 0930 Content-Disposition: attachment 0931 0932 Version: 1 0933 --nextPart2726747.j47xUGTWKg 0934 Content-Type: application/octet-stream 0935 Content-Disposition: inline; filename="msg.asc" 0936 0937 -----BEGIN PGP MESSAGE----- 0938 Version: GnuPG v2.0.14 (GNU/Linux) 0939 0940 hQIOA8p5rdC5CBNfEAf+NZVzVq48C1r5opOOiWV96+FUzIWuMQ6u8fzFgI7YVyCn 0941 [SNIP] 0942 =reNr 0943 --nextPart2726747.j47xUGTWKg-- 0944 -----END PGP MESSAGE----- 0945 0946 0947 multipart/encrypted 0948 |- application/pgp-encrypted 0949 \- application/octet-stream 0950 0951 The encrypted data is contained in the `application/octet-stream` MIME part. Without decrypting 0952 the data, it is unknown what the original content type of the encrypted MIME data is! The encrypted 0953 data could be a simple text/plain MIME part, an image attachment, or a multipart part. The encrypted 0954 data contains both the MIME header and the MIME body of the original MIME part, as the header is needed 0955 to know the content type of the data. The data could as well be of content type multipart/signed, in 0956 which case the message would be both signed and encrypted. 0957 0958 #### Inline Crypto Formats #### 0959 0960 Although using the security multiparts `multipart/signed` and `multipart/encrypted` is the recommended 0961 standard, there are other possibilities to sign or encrypt a message. The most common methods are 0962 **Inline OpenPGP** and **S/MIME Opaque**. 0963 0964 For inline OpenPGP messages, the crypto data is contained inlined in the actual MIME part. For example, 0965 a message with a signed text/plain part might look like this: 0966 0967 From: someone@domain.com 0968 To: someoneelse@domain.com 0969 Subject: Inline OpenPGP test 0970 MIME-Version: 1.0 0971 Content-Type: text/plain; 0972 charset="us-ascii" 0973 Content-Transfer-Encoding: 7bit 0974 Content-Disposition: inline 0975 0976 -----BEGIN PGP SIGNED MESSAGE----- 0977 Hash: SHA1 0978 0979 Inline OpenPGP signed example. 0980 -----BEGIN PGP SIGNATURE----- 0981 Version: GnuPG v2.0.14 (GNU/Linux) 0982 0983 iEYEARECAAYFAkueJ2EACgkQKglv3sO8a1MS3QCfcsYnJG7uYQxzxz6J5cPF7lHz 0984 WIoAn3PjVPlWibu02dfdFObwd2eJ1jAW 0985 =p3uO 0986 -----END PGP SIGNATURE----- 0987 0988 Encrypted inline OpenPGP works in a similar way. Opaque S/MIME messages are also similar: For signed 0989 MIME parts, both the signature and the signed data are contained in a single MIME part with a content 0990 type of `application/pkcs7-mime`. 0991 0992 As security multiparts are preferred over inline OpenPGP and over opaque S/MIME, I won't go into more 0993 detail here. 0994 0995 ### Miscellaneous Points about Messages ### {#misc} 0996 0997 #### Line Breaks #### 0998 0999 Each line in a MIME message has to end with a **CRLF**, which is a carriage return followed by a 1000 newline, which is the escape sequence `\\r\\n`. CR and LF may not appear in other places in 1001 a MIME message. Special care needs to be taken with encoded line breaks in binary data, and with 1002 distinguishing soft and hard line breaks when converting between different content transfer encodings. 1003 For more details, have a look at the RFCs. 1004 1005 While the official format is to have a CRLF at the end of each line, KMime only expects a single LF 1006 for its in-memory storage. Therefore, when loading a message from disk or from a server into KMime, the CRLFs need 1007 to be converted to LFs first, for example with KMime::CRLFtoLF(). The opposite needs to be done when 1008 storing a KMime message somewhere. 1009 1010 Lines should not be longer than 78 characters and may not be longer than 998 characters. 1011 1012 #### Header Folding and CFWS #### 1013 1014 Header fields can span multiple lines, which was already shown in some of the examples above where 1015 the parameters of the header field value were in the next line. The header field is said to be 1016 **folded** in this case. In general, header fields can be folded whenever whitespace (**WS**) occurs. 1017 1018 Header field values can contain **comments**; these comments are semantically invisible and have no 1019 meaning. Comments are surrounded by parentheses. 1020 1021 Date: Thu, 13 1022 Feb 1969 23:32 -0330 (Newfoundland Time) 1023 1024 This example shows a folded header that also has a comment (*Newfoundland Time*). The date header is a structured header 1025 field, and therefore it has to obey to a defined syntax; however, adding comments and whitespace is 1026 allowed almost anywhere, and they are ignored when parsing the message. Comments and whitespace where 1027 folding is allowed is sometimes referred to as **CFWS**. Any occurrence of CFWS is semantically regarded 1028 as a single space. 1029 1030 # The two in-memory representations of messages # {#string-broken-down } 1031 1032 There are two representations of messages in memory. The first is called **string representation** 1033 and the other one is called **broken-down representation**. 1034 1035 String representation is somewhat misnamed, 1036 a better term would be "byte array representation". The string representation is just a big array of 1037 bytes in memory, and those bytes make up the encoded mail. The string representation is what is stored 1038 on disk or what is received from an IMAP server, for example. 1039 1040 With the broken-down representation, the mail is *broken down* into smaller structures. For example, 1041 instead of having a single byte array for all headers, the broken-down structure has a list of individual headers, 1042 and each header in that list is again broken down into a structure. While the string representation 1043 is just an array of 7 bit characters that might be encoded, the broken-down representations contain the 1044 decoded text strings. 1045 1046 As an example, consider the byte array 1047 1048 "Hugo Maier" <hugo.maier@mailer.domain> 1049 1050 Although this is just a bunch of 7 bit characters, a human immediately recognizes the broken-down structure and 1051 sees that the display name is "Hugo Maier" and that the localpart of the email address is "hugo.maier". 1052 To illustrate, the broken-down structure could be stored in a structure like this: 1053 1054 struct Mailbox 1055 { 1056 QString displayName; 1057 QByteArray addressSpec; 1058 }; 1059 1060 The address spec actually could be broken down further into a localpart and a domain. 1061 The process of converting the string representation to a broken-down representation is called **parsing**, and 1062 the reverse is called **assembling**. Parsing a message is necessary when wanting to access or modify the broken-down 1063 structure. For example, when sending a mail, 1064 the address spec of a mailbox needs to be passed to the SMTP server, which means that the recipient headers need to 1065 be parsed in order to access that information. Another example is the message list in an mail application, where the 1066 broken-down structure of a mail is needed 1067 to display information like subject, sender and date in the list. 1068 On the other hand, assembling a message is for example done in the composer of a mail application, where the mail information 1069 is available in a broken-down form in the composer window, and is then assembled into a final MIME message that is then sent with SMTP. 1070 1071 Parsing is often quite tricky. You should always use the methods from KMime instead of writing parsing 1072 routines yourself. Even the simple mailbox example above is in practice difficult to parse, as many things like comments 1073 and escaped characters need to be taken into consideration. 1074 The same is true for assembling: In the above case, one could be tempted to assemble the mailbox by simply 1075 writing code like this: 1076 1077 QByteArray stringRepresentation = '"' + displayName + "\" <" + addressSpec + ">"; 1078 1079 However, just like with parsing, you shouldn't be doing assembling yourself. In the above case, for example, 1080 the display name might contain non-ASCII characters, and RFC2047 encoding would need to be applied. So use 1081 KMime for assembling in all cases. 1082 1083 When parsing a message and assembling it afterwards, the result might not be the same as the original byte 1084 array. For example, comments in header fields are ignored during parsing and not stored in the broken-down 1085 structure, therefore the assembled message will also not contain comments. 1086 1087 Messages in memory are usually stored in a broken-down structure so that it is easy to to access and 1088 manipulate the message. On disk and on servers, messages are stored in string representation. 1089 1090 # Overview of KMime classes # {#classes-overview} 1091 1092 KMime has basically two sets of classes: Classes for headers and classes for MIME 1093 parts. A MIME part is represented by `KMime::Content`. A Content can be parsed from a string representation 1094 and also be assembled from the broken-down representation again. If parsed, it has a list of sub-contents (in case of multipart contents) and a 1095 list of headers. If the Content is not parsed, it stores the headers and the body in a byte array, which can be accessed 1096 with head() and body(). 1097 There is also a class `KMime::Message`, which basically is a thin wrapper around Content for the top-level 1098 MIME part. Message also contains convenience methods to access the message headers. 1099 1100 For headers, there is a class hierarchy, with `KMime::Headers::Base` as the base class, and 1101 `KMime::Headers::Generics::Structured` and `KMime::Headers::Generics::Unstructured` in the next levels. Unstructured is 1102 for headers that don't have a defined structure, like Subject, whereas Structured headers have a 1103 specific structure, like Date. The header classes have methods to parse headers, like `from7BitString()`, 1104 and to assemble them, like `as7BitString()`. Once a header is parsed, the classes provide access to the 1105 broken-down structures; for example the `Date` header has a method `dateTime()`. 1106 The parsing in `from7BitString()` is usually handled by a protected `parse()` function, which in turn call 1107 parsing functions for different types, like `parseAddressList()` or `parseAddrSpec()` from the `KMime::HeaderParsing` 1108 namespace. 1109 1110 When modifying messages, the message is first parsed into a broken-down representation. This broken-down 1111 representation can then be accessed and modified with the appropriate functions. After changing the broken-down 1112 structure, it needs to be assembled again to get the modified string representation. 1113 1114 KMime also comes with some codes for handling base64 and quoted-printable encoding, with `KMime::Codec` 1115 as the base class. 1116 1117 # RFCs # {#rfcs} 1118 1119 * [RFC 5322](https://tools.ietf.org/html/rfc5322): Internet Message Format 1120 * [RFC 5536](https://tools.ietf.org/html/rfc5536): Netnews Article Format 1121 * [RFC 2045](https://tools.ietf.org/html/rfc2045): Multipurpose Internet Mail Extensions (MIME), Part 1: Format of Internet Message Bodies 1122 * [RFC 2046](https://tools.ietf.org/html/rfc2046): Multipurpose Internet Mail Extensions (MIME), Part 2: Media Types 1123 * [RFC 2047](https://tools.ietf.org/html/rfc2047): Multipurpose Internet Mail Extensions (MIME), Part 3: Message Header Extensions for Non-ASCII Text 1124 * [RFC 2048](https://tools.ietf.org/html/rfc2048): Multipurpose Internet Mail Extensions (MIME), Part 4: Registration Procedures 1125 * [RFC 2049](https://tools.ietf.org/html/rfc2049): Multipurpose Internet Mail Extensions (MIME), Part 5: Conformance Criteria and Examples 1126 * [RFC 2231](https://tools.ietf.org/html/rfc2231): MIME Parameter Value and Encoded Word Extensions: Character Sets, Languages, and Continuations 1127 * [RFC 2183](https://tools.ietf.org/html/rfc2183): Communicating Presentation Information in Internet Message: The Content-Disposition Header Field 1128 * [RFC 2557](https://tools.ietf.org/html/rfc2557): MIME Encapsulation of Aggregate Documents, such as HTML (MHTML) 1129 * [RFC 1847](https://tools.ietf.org/html/rfc1847): Security Multiparts for MIME: Multipart/Signed and Multipart/Encrypted 1130 * [RFC 3851](https://tools.ietf.org/html/rfc3851): S/MIME Version 3 Message Specification 1131 * [RFC 3156](https://tools.ietf.org/html/rfc3156): MIME Security with OpenPGP 1132 * [RFC 2298](https://tools.ietf.org/html/rfc2298): An Extensible Message Format for Message Disposition Notifications 1133 * [RFC 2646](https://tools.ietf.org/html/rfc2646): The Text/Plain Format Parameter (not supported by KMime) 1134 1135 # Further Reading # {#section} 1136 1137 * [Wikipedia article on MIME](https://en.wikipedia.org/wiki/MIME) 1138 * [The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)](https://www.joelonsoftware.com/articles/Unicode.html) 1139 * [A tutorial on character code issues](https://www.cs.tut.fi/~jkorpela/chars.html) 1140 * [Online Base64 encoder and decoder](https://www.motobit.com/util/base64-decoder-encoder.asp) 1141 * [Online quoted-printable encoder](https://www.motobit.com/util/quoted-printable-encoder.asp) 1142 * [Onlinw quota reached](https://www.motobit.com/util/quoted-printable-decoder.asp) 1143 * [Online charset converter](https://www.motobit.com/util/charset-codepage-conversion.asp) 1144 * [Wikipedia article on public-key cryptography](https://en.wikipedia.org/wiki/Public-key_cryptography)