diff options
author | Jeffrey Stedfast <fejj@ximian.com> | 2003-08-12 01:57:45 +0800 |
---|---|---|
committer | Jeffrey Stedfast <fejj@src.gnome.org> | 2003-08-12 01:57:45 +0800 |
commit | b328a21e7c026aaa9cdd5e332ed7e39e0003d8eb (patch) | |
tree | 2c0f3caac1a793197a951958fa7f07cdbf61ac07 /camel/camel-mime-part-utils.c | |
parent | 7b1013be730b11384d0e0af340758bdef3f00330 (diff) | |
download | gsoc2013-evolution-b328a21e7c026aaa9cdd5e332ed7e39e0003d8eb.tar.gz gsoc2013-evolution-b328a21e7c026aaa9cdd5e332ed7e39e0003d8eb.tar.zst gsoc2013-evolution-b328a21e7c026aaa9cdd5e332ed7e39e0003d8eb.zip |
New test suite for the mime parser (which is where the below 2 fixes were
* tests/message/test4.c: New test suite for the mime parser (which
is where the below 2 fixes were noticed).
* camel-mime-parser.c (folder_boundary_check): Calculate 'len' by
subtracting the boundary start from inend rather than 'atleast'.
(folder_scan_content): Calculate 'inend' differently depending on
the EOF state.
2003-08-08 Jeffrey Stedfast <fejj@ximian.com>
* camel-mime-filter-tohtml.c (html_convert): Rather than checking
*inptr == '\n', check inptr >= inend - this gets rid of an Invalid
Read report from valgrind.
* camel-mime-part.c (write_to_stream): Don't necessarily re-encode
just because the encodings differ. Need to look into making it so
that message/rfc822 and multipart parts ignore the
Content-Transfer-Encoding header and just keep their 'encoding'
bits set to DEFAULT.
2003-08-05 Jeffrey Stedfast <fejj@ximian.com>
* providers/imap/camel-imap-folder.c (get_content): Updated.
* camel-mime-message.c (camel_mime_message_init): Don't override
the mime_type here.
(process_header): Updated to use CamelDataWrapper's mime_type
field.
(find_best_encoding): Same.
(best_encoding): Here too.
* camel-digest-folder.c (camel_digest_folder_new): Updated for
CamelMimePart::content_type change.
* camel-mime-part.c (camel_mime_part_init): Override our parent
class's default mime_type.
(camel_mime_part_finalize): Don't need to unref the content_type
anymore.
(process_header): Updated to use CamelDataWrapper's mime_type
field.
(camel_mime_part_set_filename): Same.
(camel_mime_part_get_filename): Same.
(camel_mime_part_get_content_type): Same.
(set_content_object): Here too.
(write_to_stream): Updated.
(construct_from_parser): Updated.
* camel-mime-part.h: Remove the content_type field.
2003-07-31 Jeffrey Stedfast <fejj@ximian.com>
* tests/lib/messages.c (test_message_compare_content): If the
chunks differ, perform a hexdump on the data being compared so
that we may analyse it easier.
* camel-multipart-signed.c (write_to_stream): Return ssize_t.
* camel-mime-utils.h: Added the CamelMimePartEncodingType enum
here.
* camel-mime-part.h: Removed the CamelMimePartEncodingType enum
from here.
* camel-mime-part.c (write_to_stream): Updated to return
ssize_t. Also minor changes to only re-encode the content stream
if the charset or encoding changed (this way we write it out in
the original raw form if nothing changed).
* camel-mime-part-utils.c
(simple_data_wrapper_construct_from_parser): Drastically
simplify. We no longer scan html content to try and find the
charset, nor do we care about converting the content to UTF-8 and
handling broken windows charsets.
* camel-mime-message.c (find_best_encoding): Use
decode_to_stream() here. Also updated to not assume the content
charset is UTF-8 since it is very likely not the case anymore
since data-wrappers no longer are converted to UTF-8 at parse
time.
* camel-folder-summary.c (summary_build_content_info_message): Use
decode_to_stream instead here too.
* camel-folder-search.c (match_words_1message): Use
decode_to_stream instead of write_to_stream so we can search the
contents.
* camel-data-wrapper.c (camel_data_wrapper_init): Set the default
encoding to DEFAULT.
(write_to_stream): Updated to return ssize_t
(camel_data_wrapper_decode_to_stream): New virtual function to
decode a data wrapper to a stream (results in nearly identical
behaviour to the old write_to_stream method).
(decode_to_stream): Default implementation of above virtual
method. Decodes base64/qp/etc streams.
* camel-data-wrapper.h: Removed the rawtext bit and added an
encoding member.
svn path=/trunk/; revision=22171
Diffstat (limited to 'camel/camel-mime-part-utils.c')
-rw-r--r-- | camel/camel-mime-part-utils.c | 360 |
1 files changed, 11 insertions, 349 deletions
diff --git a/camel/camel-mime-part-utils.c b/camel/camel-mime-part-utils.c index 800f23372e..92769b3083 100644 --- a/camel/camel-mime-part-utils.c +++ b/camel/camel-mime-part-utils.c @@ -5,7 +5,7 @@ * Michael Zucchi <notzed@ximian.com> * Jeffrey Stedfast <fejj@ximian.com> * - * Copyright 1999, 2000 Ximian, Inc. (www.ximian.com) + * Copyright 1999-2003 Ximian, Inc. (www.ximian.com) * * This program is free software; you can redistribute it and/or * modify it under the terms of version 2 of the GNU General Public @@ -34,7 +34,6 @@ #include <gal/util/e-iconv.h> -#include "camel-string-utils.h" #include "camel-charset-map.h" #include "camel-mime-part-utils.h" #include "camel-mime-message.h" @@ -54,278 +53,19 @@ #define d(x) /*(printf("%s(%d): ", __FILE__, __LINE__),(x)) #include <stdio.h>*/ -/* example: <META http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"> */ - -static const char * -check_html_charset(char *buffer, int length) -{ - CamelHTMLParser *hp; - const char *charset = NULL; - camel_html_parser_t state; - struct _header_content_type *ct; - - /* if we need to first base64/qp decode, do this here, sigh */ - hp = camel_html_parser_new(); - camel_html_parser_set_data(hp, buffer, length, TRUE); - - do { - const char *data; - int len; - const char *val; - - state = camel_html_parser_step(hp, &data, &len); - - /* example: <META http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"> */ - - switch(state) { - case CAMEL_HTML_PARSER_ELEMENT: - val = camel_html_parser_tag(hp); - d(printf("Got tag: %s\n", val)); - if (strcasecmp(val, "meta") == 0 - && (val = camel_html_parser_attr(hp, "http-equiv")) - && strcasecmp(val, "content-type") == 0 - && (val = camel_html_parser_attr(hp, "content")) - && (ct = header_content_type_decode(val))) { - charset = header_content_type_param(ct, "charset"); - charset = e_iconv_charset_name (charset); - header_content_type_unref(ct); - } - break; - default: - /* ignore everything else */ - break; - } - } while (charset == NULL && state != CAMEL_HTML_PARSER_EOF); - - camel_object_unref (hp); - - return charset; -} - -static GByteArray * -convert_buffer (GByteArray *in, const char *to, const char *from) -{ - size_t inleft, outleft, outlen, converted = 0; - GByteArray *out = NULL; - const char *inbuf; - char *outbuf; - iconv_t cd; - - if (in->len == 0) - return g_byte_array_new(); - - d(printf("converting buffer from %s to %s:\n", from, to)); - d(fwrite(in->data, 1, (int)in->len, stdout)); - d(printf("\n")); - - cd = e_iconv_open(to, from); - if (cd == (iconv_t) -1) { - g_warning ("Cannot convert from '%s' to '%s': %s", from, to, strerror (errno)); - return NULL; - } - - outlen = in->len * 2 + 16; - out = g_byte_array_new (); - g_byte_array_set_size (out, outlen); - - inbuf = in->data; - inleft = in->len; - - do { - outbuf = out->data + converted; - outleft = outlen - converted; - - converted = e_iconv (cd, &inbuf, &inleft, &outbuf, &outleft); - if (converted == (size_t) -1) { - if (errno != E2BIG && errno != EINVAL) - goto fail; - } - - /* - * E2BIG There is not sufficient room at *outbuf. - * - * We just need to grow our outbuffer and try again. - */ - - converted = outbuf - (char *)out->data; - if (errno == E2BIG) { - outlen += inleft * 2 + 16; - out = g_byte_array_set_size (out, outlen); - outbuf = out->data + converted; - } - - } while (errno == E2BIG && inleft > 0); - - /* - * EINVAL An incomplete multibyte sequence has been encoun - * tered in the input. - * - * We'll just have to ignore it... - */ - - /* flush the iconv conversion */ - e_iconv (cd, NULL, NULL, &outbuf, &outleft); - - /* now set the true length on the GByteArray */ - converted = outbuf - (char *)out->data; - g_byte_array_set_size (out, converted); - - d(printf("converted data:\n")); - d(fwrite(out->data, 1, (int)out->len, stdout)); - d(printf("\n")); - - e_iconv_close (cd); - - return out; - - fail: - g_warning ("Cannot convert from '%s' to '%s': %s", from, to, strerror (errno)); - - g_byte_array_free (out, TRUE); - - e_iconv_close (cd); - - return NULL; -} - -/* We don't really use the charset argument except for debugging... */ -static gboolean -broken_windows_charset (GByteArray *buffer, const char *charset) -{ - register unsigned char *inptr; - unsigned char *inend; - - inptr = buffer->data; - inend = inptr + buffer->len; - - while (inptr < inend) { - register unsigned char c = *inptr++; - - if (c >= 128 && c <= 159) { - g_warning ("Encountered Windows charset parading as %s", charset); - return TRUE; - } - } - - return FALSE; -} - -static gboolean -is_7bit (GByteArray *buffer) -{ - register unsigned int i; - - for (i = 0; i < buffer->len; i++) - if (buffer->data[i] > 127) - return FALSE; - - return TRUE; -} - -static const char *iso_charsets[] = { - "us-ascii", - "iso-8859-1", - "iso-8859-2", - "iso-8859-3", - "iso-8859-4", - "iso-8859-5", - "iso-8859-6", - "iso-8859-7", - "iso-8859-8", - "iso-8859-9", - "iso-8859-10", - "iso-8859-11", - "iso-8859-12", - "iso-8859-13", - "iso-8859-14", - "iso-8859-15", - "iso-8859-16" -}; - -#define NUM_ISO_CHARSETS (sizeof (iso_charsets) / sizeof (iso_charsets[0])) - -static const char * -canon_charset_name (const char *charset) -{ - const char *ptr; - char *endptr; - int iso; - - if (strncasecmp (charset, "iso", 3) != 0) - return charset; - - ptr = charset + 3; - if (*ptr == '-' || *ptr == '_') - ptr++; - - /* if it's not an iso-8859-# charset, we don't care about it */ - if (strncmp (ptr, "8859", 4) != 0) - return charset; - - ptr += 4; - if (*ptr == '-' || *ptr == '_') - ptr++; - - iso = strtoul (ptr, &endptr, 10); - if (endptr == ptr || *endptr != '\0') - return charset; - - if (iso >= NUM_ISO_CHARSETS) - return charset; - - return iso_charsets[iso]; -} - /* simple data wrapper */ static void simple_data_wrapper_construct_from_parser (CamelDataWrapper *dw, CamelMimeParser *mp) { - CamelMimeFilter *fdec = NULL, *fcrlf = NULL; - CamelMimeFilterBasicType enctype = 0; - size_t len; - int decid = -1, crlfid = -1; - struct _header_content_type *ct; - const char *charset = NULL; char *encoding, *buf; GByteArray *buffer; CamelStream *mem; + size_t len; d(printf ("simple_data_wrapper_construct_from_parser()\n")); /* first, work out conversion, if any, required, we dont care about what we dont know about */ encoding = header_content_encoding_decode (camel_mime_parser_header (mp, "Content-Transfer-Encoding", NULL)); - if (encoding) { - if (!strcasecmp (encoding, "base64")) { - d(printf("Adding base64 decoder ...\n")); - enctype = CAMEL_MIME_FILTER_BASIC_BASE64_DEC; - } else if (!strcasecmp (encoding, "quoted-printable")) { - d(printf("Adding quoted-printable decoder ...\n")); - enctype = CAMEL_MIME_FILTER_BASIC_QP_DEC; - } else if (!strcasecmp (encoding, "x-uuencode")) { - d(printf("Adding uudecoder ...\n")); - enctype = CAMEL_MIME_FILTER_BASIC_UU_DEC; - } - g_free (encoding); - - if (enctype != 0) { - fdec = (CamelMimeFilter *)camel_mime_filter_basic_new_type(enctype); - decid = camel_mime_parser_filter_add (mp, fdec); - } - } - - /* If we're doing text, we also need to do CRLF->LF and may have to convert it to UTF8 as well. */ - ct = camel_mime_parser_content_type (mp); - if (header_content_type_is (ct, "text", "*")) { - charset = header_content_type_param (ct, "charset"); - charset = e_iconv_charset_name (charset); - - if (fdec) { - d(printf ("Adding CRLF conversion filter\n")); - fcrlf = camel_mime_filter_crlf_new (CAMEL_MIME_FILTER_CRLF_DECODE, - CAMEL_MIME_FILTER_CRLF_MODE_CRLF_ONLY); - crlfid = camel_mime_parser_filter_add (mp, fcrlf); - } - } /* read in the entire content */ buffer = g_byte_array_new (); @@ -334,86 +74,16 @@ simple_data_wrapper_construct_from_parser (CamelDataWrapper *dw, CamelMimeParser g_byte_array_append (buffer, buf, len); } - /* check for broken Outlook/Web mailers that like to send html marked as text/plain */ - if (header_content_type_is (ct, "text", "plain")) { - register const unsigned char *inptr; - const unsigned char *inend; - - inptr = buffer->data; - inend = inptr + buffer->len; - - while (inptr < inend && isspace ((int) *inptr)) - inptr++; - - if (((inend-inptr) > 5 && g_ascii_strncasecmp(inptr, "<html", 5) == 0) - || ((inend-inptr) > 9 && g_ascii_strncasecmp(inptr, "<!doctype", 9) == 0)) { - /* re-tag as text/html */ - g_free (ct->subtype); - ct->subtype = g_strdup ("html"); - } - } - - /* Possible Lame Mailer Alert... check the META tags for a charset */ - if (!charset && header_content_type_is (ct, "text", "html")) { - if ((charset = check_html_charset (buffer->data, buffer->len))) - header_content_type_set_param (ct, "charset", charset); - } - - /* if we need to do charset conversion, see if we can/it works/etc */ - if (charset && !(strcasecmp (charset, "us-ascii") == 0 - || strcasecmp (charset, "utf-8") == 0 - || strncasecmp (charset, "x-", 2) == 0)) { - GByteArray *out; - - /* You often see Microsoft Windows users announcing their texts - * as being in ISO-8859-1 even when in fact they contain funny - * characters from the Windows-CP1252 superset. - */ - charset = canon_charset_name (charset); - if (!strncasecmp (charset, "iso-8859", 8)) { - /* check for Windows-specific chars... */ - if (broken_windows_charset (buffer, charset)) - charset = camel_charset_iso_to_windows (charset); - } - - out = convert_buffer (buffer, "UTF-8", charset); - if (out) { - /* converted ok, use this data instead */ - g_byte_array_free(buffer, TRUE); - dw->rawtext = FALSE; - buffer = out; - } else { - /* else failed to convert, leave as raw? */ - g_warning("Storing text as raw, unknown charset '%s' or invalid format", charset); - dw->rawtext = TRUE; - } - } else if (header_content_type_is (ct, "text", "*")) { - if (charset == NULL || !strcasecmp (charset, "us-ascii")) { - /* check that it's 7bit */ - dw->rawtext = !is_7bit (buffer); - } else if (!strncasecmp (charset, "x-", 2)) { - /* we're not even going to bother trying to convert, so set the - rawtext bit to TRUE and let the mailer deal with it. */ - dw->rawtext = TRUE; - } else if (!strcasecmp (charset, "utf-8") && buffer->len) { - /* check that it is valid utf8 */ - dw->rawtext = !g_utf8_validate (buffer->data, buffer->len, NULL); - } - } - d(printf("message part kept in memory!\n")); - mem = camel_stream_mem_new_with_byte_array(buffer); - camel_data_wrapper_construct_from_stream(dw, mem); - camel_object_unref((CamelObject *)mem); - - camel_mime_parser_filter_remove(mp, decid); - camel_mime_parser_filter_remove(mp, crlfid); + mem = camel_stream_mem_new_with_byte_array (buffer); + camel_data_wrapper_construct_from_stream (dw, mem); + camel_object_unref (mem); - if (fdec) - camel_object_unref((CamelObject *)fdec); - if (fcrlf) - camel_object_unref((CamelObject *)fcrlf); + if (encoding) { + dw->encoding = camel_mime_part_encoding_from_string (encoding); + g_free (encoding); + } } /* This replaces the data wrapper repository ... and/or could be replaced by it? */ @@ -424,7 +94,7 @@ camel_mime_part_construct_content_from_parser (CamelMimePart *dw, CamelMimeParse CamelContentType *ct; ct = camel_mime_parser_content_type (mp); - + switch (camel_mime_parser_state (mp)) { case HSCAN_HEADER: d(printf("Creating body part\n")); @@ -457,19 +127,11 @@ camel_mime_part_construct_content_from_parser (CamelMimePart *dw, CamelMimeParse default: g_warning("Invalid state encountered???: %d", camel_mime_parser_state (mp)); } + if (content) { /* would you believe you have to set this BEFORE you set the content object??? oh my god !!!! */ camel_data_wrapper_set_mime_type_field (content, camel_mime_part_get_content_type (dw)); camel_medium_set_content_object ((CamelMedium *)dw, content); - - /* Note: we don't set ct as the content-object's mime-type above because - * camel_medium_set_content_object() may re-write the Content-Type header - * (see CamelMimePart::set_content_object) if we did that (which is a Bad Thing). - * However, if we set it *afterward*, we can still use any special auto-detections - * that we found in simple_data_wrapper_construct_from_parser(). This is important - * later when we go to render the MIME parts in mail-format.c */ - camel_data_wrapper_set_mime_type_field (content, ct); - camel_object_unref (content); } } |