New test suite for the mime parser (which is where the below 2 fixes were

* tests/message/test4.c: New test suite for the mime parser (which is where the below 2 fixes were noticed). * camel-mime-parser.c (folder_boundary_check): Calculate 'len' by subtracting the boundary start from inend rather than 'atleast'. (folder_scan_content): Calculate 'inend' differently depending on the EOF state. 2003-08-08 Jeffrey Stedfast <fejj@ximian.com> * camel-mime-filter-tohtml.c (html_convert): Rather than checking *inptr == '\n', check inptr >= inend - this gets rid of an Invalid Read report from valgrind. * camel-mime-part.c (write_to_stream): Don't necessarily re-encode just because the encodings differ. Need to look into making it so that message/rfc822 and multipart parts ignore the Content-Transfer-Encoding header and just keep their 'encoding' bits set to DEFAULT. 2003-08-05 Jeffrey Stedfast <fejj@ximian.com> * providers/imap/camel-imap-folder.c (get_content): Updated. * camel-mime-message.c (camel_mime_message_init): Don't override the mime_type here. (process_header): Updated to use CamelDataWrapper's mime_type field. (find_best_encoding): Same. (best_encoding): Here too. * camel-digest-folder.c (camel_digest_folder_new): Updated for CamelMimePart::content_type change. * camel-mime-part.c (camel_mime_part_init): Override our parent class's default mime_type. (camel_mime_part_finalize): Don't need to unref the content_type anymore. (process_header): Updated to use CamelDataWrapper's mime_type field. (camel_mime_part_set_filename): Same. (camel_mime_part_get_filename): Same. (camel_mime_part_get_content_type): Same. (set_content_object): Here too. (write_to_stream): Updated. (construct_from_parser): Updated. * camel-mime-part.h: Remove the content_type field. 2003-07-31 Jeffrey Stedfast <fejj@ximian.com> * tests/lib/messages.c (test_message_compare_content): If the chunks differ, perform a hexdump on the data being compared so that we may analyse it easier. * camel-multipart-signed.c (write_to_stream): Return ssize_t. * camel-mime-utils.h: Added the CamelMimePartEncodingType enum here. * camel-mime-part.h: Removed the CamelMimePartEncodingType enum from here. * camel-mime-part.c (write_to_stream): Updated to return ssize_t. Also minor changes to only re-encode the content stream if the charset or encoding changed (this way we write it out in the original raw form if nothing changed). * camel-mime-part-utils.c (simple_data_wrapper_construct_from_parser): Drastically simplify. We no longer scan html content to try and find the charset, nor do we care about converting the content to UTF-8 and handling broken windows charsets. * camel-mime-message.c (find_best_encoding): Use decode_to_stream() here. Also updated to not assume the content charset is UTF-8 since it is very likely not the case anymore since data-wrappers no longer are converted to UTF-8 at parse time. * camel-folder-summary.c (summary_build_content_info_message): Use decode_to_stream instead here too. * camel-folder-search.c (match_words_1message): Use decode_to_stream instead of write_to_stream so we can search the contents. * camel-data-wrapper.c (camel_data_wrapper_init): Set the default encoding to DEFAULT. (write_to_stream): Updated to return ssize_t (camel_data_wrapper_decode_to_stream): New virtual function to decode a data wrapper to a stream (results in nearly identical behaviour to the old write_to_stream method). (decode_to_stream): Default implementation of above virtual method. Decodes base64/qp/etc streams. * camel-data-wrapper.h: Removed the rawtext bit and added an encoding member. svn path=/trunk/; revision=22171
author: Jeffrey Stedfast <fejj@ximian.com> 2003-08-12 01:57:45 +0800
committer: Jeffrey Stedfast <fejj@src.gnome.org> 2003-08-12 01:57:45 +0800
commit: b328a21e7c026aaa9cdd5e332ed7e39e0003d8eb (patch)
tree: 2c0f3caac1a793197a951958fa7f07cdbf61ac07 /camel/camel-mime-part-utils.c
parent: 7b1013be730b11384d0e0af340758bdef3f00330 (diff)
download: gsoc2013-evolution-b328a21e7c026aaa9cdd5e332ed7e39e0003d8eb.tar.gz
gsoc2013-evolution-b328a21e7c026aaa9cdd5e332ed7e39e0003d8eb.tar.zst
gsoc2013-evolution-b328a21e7c026aaa9cdd5e332ed7e39e0003d8eb.zip
1 files changed, 11 insertions, 349 deletions
diff --git a/camel/camel-mime-part-utils.c b/camel/camel-mime-part-utils.c
index 800f23372e..92769b3083 100644
--- a/camel/camel-mime-part-utils.c
+++ b/camel/camel-mime-part-utils.c
@@ -5,7 +5,7 @@
  *          Michael Zucchi <notzed@ximian.com>
  *          Jeffrey Stedfast <fejj@ximian.com>
  *
- * Copyright 1999, 2000 Ximian, Inc. (www.ximian.com)
+ * Copyright 1999-2003 Ximian, Inc. (www.ximian.com)
  *
  * This program is free software; you can redistribute it and/or 
  * modify it under the terms of version 2 of the GNU General Public 
@@ -34,7 +34,6 @@
 
 #include <gal/util/e-iconv.h>
 
-#include "camel-string-utils.h"
 #include "camel-charset-map.h"
 #include "camel-mime-part-utils.h"
 #include "camel-mime-message.h"
@@ -54,278 +53,19 @@
 #define d(x) /*(printf("%s(%d): ", __FILE__, __LINE__),(x))
 	       #include <stdio.h>*/
 
-/* example: <META http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"> */
-
-static const char *
-check_html_charset(char *buffer, int length)
-{
-	CamelHTMLParser *hp;
-	const char *charset = NULL;
-	camel_html_parser_t state;
-	struct _header_content_type *ct;
-
-	/* if we need to first base64/qp decode, do this here, sigh */
-	hp = camel_html_parser_new();
-	camel_html_parser_set_data(hp, buffer, length, TRUE);
-	
-	do {
-		const char *data;
-		int len;
-		const char *val;
-		
-		state = camel_html_parser_step(hp, &data, &len);
-		
-		/* example: <META http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"> */
-		
-		switch(state) {
-		case CAMEL_HTML_PARSER_ELEMENT:
-			val = camel_html_parser_tag(hp);
-			d(printf("Got tag: %s\n", val));
-			if (strcasecmp(val, "meta") == 0
-			    && (val = camel_html_parser_attr(hp, "http-equiv"))
-			    && strcasecmp(val, "content-type") == 0
-			    && (val = camel_html_parser_attr(hp, "content"))
-			    && (ct = header_content_type_decode(val))) {
-				charset = header_content_type_param(ct, "charset");
-				charset = e_iconv_charset_name (charset);
-				header_content_type_unref(ct);
-			}
-			break;
-		default:
-			/* ignore everything else */
-			break;
-		}
-	} while (charset == NULL && state != CAMEL_HTML_PARSER_EOF);
-
-	camel_object_unref (hp);
-
-	return charset;
-}
-
-static GByteArray *
-convert_buffer (GByteArray *in, const char *to, const char *from)
-{
-	size_t inleft, outleft, outlen, converted = 0;
-	GByteArray *out = NULL;
-	const char *inbuf;
-	char *outbuf;
-	iconv_t cd;
-	
-	if (in->len == 0)
-		return g_byte_array_new();
-	
-	d(printf("converting buffer from %s to %s:\n", from, to));
-	d(fwrite(in->data, 1, (int)in->len, stdout));
-	d(printf("\n"));
-	
-	cd = e_iconv_open(to, from);
-	if (cd == (iconv_t) -1) {
-		g_warning ("Cannot convert from '%s' to '%s': %s", from, to, strerror (errno));
-		return NULL;
-	}
-	
-	outlen = in->len * 2 + 16;
-	out = g_byte_array_new ();
-	g_byte_array_set_size (out, outlen);
-	
-	inbuf = in->data;
-	inleft = in->len;
-	
-	do {
-		outbuf = out->data + converted;
-		outleft = outlen - converted;
-		
-		converted = e_iconv (cd, &inbuf, &inleft, &outbuf, &outleft);
-		if (converted == (size_t) -1) {
-			if (errno != E2BIG && errno != EINVAL)
-				goto fail;
-		}
-		
-		/*
-		 * E2BIG   There is not sufficient room at *outbuf.
-		 *
-		 * We just need to grow our outbuffer and try again.
-		 */
-		
-		converted = outbuf - (char *)out->data;
-		if (errno == E2BIG) {
-			outlen += inleft * 2 + 16;
-			out = g_byte_array_set_size (out, outlen);
-			outbuf = out->data + converted;
-		}
-		
-	} while (errno == E2BIG && inleft > 0);
-	
-	/*
-	 * EINVAL  An  incomplete  multibyte sequence has been encoun�
-	 *         tered in the input.
-	 *
-	 * We'll just have to ignore it...
-	 */
-	
-	/* flush the iconv conversion */
-	e_iconv (cd, NULL, NULL, &outbuf, &outleft);
-	
-	/* now set the true length on the GByteArray */
-	converted = outbuf - (char *)out->data;
-	g_byte_array_set_size (out, converted);
-	
-	d(printf("converted data:\n"));
-	d(fwrite(out->data, 1, (int)out->len, stdout));
-	d(printf("\n"));
-	
-	e_iconv_close (cd);
-	
-	return out;
-	
- fail:
-	g_warning ("Cannot convert from '%s' to '%s': %s", from, to, strerror (errno));
-	
-	g_byte_array_free (out, TRUE);
-	
-	e_iconv_close (cd);
-	
-	return NULL;
-}
-
-/* We don't really use the charset argument except for debugging... */
-static gboolean
-broken_windows_charset (GByteArray *buffer, const char *charset)
-{
-	register unsigned char *inptr;
-	unsigned char *inend;
-	
-	inptr = buffer->data;
-	inend = inptr + buffer->len;
-	
-	while (inptr < inend) {
-		register unsigned char c = *inptr++;
-		
-		if (c >= 128 && c <= 159) {
-			g_warning ("Encountered Windows charset parading as %s", charset);
-			return TRUE;
-		}
-	}
-	
-	return FALSE;
-}
-
-static gboolean
-is_7bit (GByteArray *buffer)
-{
-	register unsigned int i;
-	
-	for (i = 0; i < buffer->len; i++)
-		if (buffer->data[i] > 127)
-			return FALSE;
-	
-	return TRUE;
-}
-
-static const char *iso_charsets[] = {
-	"us-ascii",
-	"iso-8859-1",
-	"iso-8859-2",
-	"iso-8859-3",
-	"iso-8859-4",
-	"iso-8859-5",
-	"iso-8859-6",
-	"iso-8859-7",
-	"iso-8859-8",
-	"iso-8859-9",
-	"iso-8859-10",
-	"iso-8859-11",
-	"iso-8859-12",
-	"iso-8859-13",
-	"iso-8859-14",
-	"iso-8859-15",
-	"iso-8859-16"
-};
-
-#define NUM_ISO_CHARSETS (sizeof (iso_charsets) / sizeof (iso_charsets[0]))
-
-static const char *
-canon_charset_name (const char *charset)
-{
-	const char *ptr;
-	char *endptr;
-	int iso;
-	
-	if (strncasecmp (charset, "iso", 3) != 0)
-		return charset;
-	
-	ptr = charset + 3;
-	if (*ptr == '-' || *ptr == '_')
-		ptr++;
-	
-	/* if it's not an iso-8859-# charset, we don't care about it */
-	if (strncmp (ptr, "8859", 4) != 0)
-		return charset;
-	
-	ptr += 4;
-	if (*ptr == '-' || *ptr == '_')
-		ptr++;
-	
-	iso = strtoul (ptr, &endptr, 10);
-	if (endptr == ptr || *endptr != '\0')
-		return charset;
-	
-	if (iso >= NUM_ISO_CHARSETS)
-		return charset;
-	
-	return iso_charsets[iso];
-}
-
 /* simple data wrapper */
 static void
 simple_data_wrapper_construct_from_parser (CamelDataWrapper *dw, CamelMimeParser *mp)
 {
-	CamelMimeFilter *fdec = NULL, *fcrlf = NULL;
-	CamelMimeFilterBasicType enctype = 0;
-	size_t len;
-	int decid = -1, crlfid = -1;
-	struct _header_content_type *ct;
-	const char *charset = NULL;
 	char *encoding, *buf;
 	GByteArray *buffer;
 	CamelStream *mem;
+	size_t len;
 	
 	d(printf ("simple_data_wrapper_construct_from_parser()\n"));
 	
 	/* first, work out conversion, if any, required, we dont care about what we dont know about */
 	encoding = header_content_encoding_decode (camel_mime_parser_header (mp, "Content-Transfer-Encoding", NULL));
-	if (encoding) {
-		if (!strcasecmp (encoding, "base64")) {
-			d(printf("Adding base64 decoder ...\n"));
-			enctype = CAMEL_MIME_FILTER_BASIC_BASE64_DEC;
-		} else if (!strcasecmp (encoding, "quoted-printable")) {
-			d(printf("Adding quoted-printable decoder ...\n"));
-			enctype = CAMEL_MIME_FILTER_BASIC_QP_DEC;
-		} else if (!strcasecmp (encoding, "x-uuencode")) {
-			d(printf("Adding uudecoder ...\n"));
-			enctype = CAMEL_MIME_FILTER_BASIC_UU_DEC;
-		}
-		g_free (encoding);
-		
-		if (enctype != 0) {
-			fdec = (CamelMimeFilter *)camel_mime_filter_basic_new_type(enctype);
-			decid = camel_mime_parser_filter_add (mp, fdec);
-		}
-	}
-	
-	/* If we're doing text, we also need to do CRLF->LF and may have to convert it to UTF8 as well. */
-	ct = camel_mime_parser_content_type (mp);
-	if (header_content_type_is (ct, "text", "*")) {
-		charset = header_content_type_param (ct, "charset");
-		charset = e_iconv_charset_name (charset);
-		
-		if (fdec) {
-			d(printf ("Adding CRLF conversion filter\n"));
-			fcrlf = camel_mime_filter_crlf_new (CAMEL_MIME_FILTER_CRLF_DECODE,
-							    CAMEL_MIME_FILTER_CRLF_MODE_CRLF_ONLY);
-			crlfid = camel_mime_parser_filter_add (mp, fcrlf);
-		}
-	}
 	
 	/* read in the entire content */
 	buffer = g_byte_array_new ();
@@ -334,86 +74,16 @@ simple_data_wrapper_construct_from_parser (CamelDataWrapper *dw, CamelMimeParser
 		g_byte_array_append (buffer, buf, len);
 	}
 	
-	/* check for broken Outlook/Web mailers that like to send html marked as text/plain */
-	if (header_content_type_is (ct, "text", "plain")) {
-		register const unsigned char *inptr;
-		const unsigned char *inend;
-		
-		inptr = buffer->data;
-		inend = inptr + buffer->len;
-		
-		while (inptr < inend && isspace ((int) *inptr))
-			inptr++;
-
-		if (((inend-inptr) > 5 && g_ascii_strncasecmp(inptr, "<html", 5) == 0)
-		    || ((inend-inptr) > 9 && g_ascii_strncasecmp(inptr, "<!doctype", 9) == 0)) {
-			/* re-tag as text/html */
-			g_free (ct->subtype);
-			ct->subtype = g_strdup ("html");
-		}
-	}
-	
-	/* Possible Lame Mailer Alert... check the META tags for a charset */
-	if (!charset && header_content_type_is (ct, "text", "html")) {
-		if ((charset = check_html_charset (buffer->data, buffer->len)))
-			header_content_type_set_param (ct, "charset", charset);
-	}
-	
-	/* if we need to do charset conversion, see if we can/it works/etc */
-	if (charset && !(strcasecmp (charset, "us-ascii") == 0
-			 || strcasecmp (charset, "utf-8") == 0
-			 || strncasecmp (charset, "x-", 2) == 0)) {
-		GByteArray *out;
-		
-		/* You often see Microsoft Windows users announcing their texts
-		 * as being in ISO-8859-1 even when in fact they contain funny
-		 * characters from the Windows-CP1252 superset.
-		 */
-		charset = canon_charset_name (charset);
-		if (!strncasecmp (charset, "iso-8859", 8)) {
-			/* check for Windows-specific chars... */
-			if (broken_windows_charset (buffer, charset))
-				charset = camel_charset_iso_to_windows (charset);
-		}
-		
-		out = convert_buffer (buffer, "UTF-8", charset);
-		if (out) {
-			/* converted ok, use this data instead */
-			g_byte_array_free(buffer, TRUE);
-			dw->rawtext = FALSE;
-			buffer = out;
-		} else {
-			/* else failed to convert, leave as raw? */
-			g_warning("Storing text as raw, unknown charset '%s' or invalid format", charset);
-			dw->rawtext = TRUE;
-		}
-	} else if (header_content_type_is (ct, "text", "*")) {
-		if (charset == NULL || !strcasecmp (charset, "us-ascii")) {
-			/* check that it's 7bit */
-			dw->rawtext = !is_7bit (buffer);
-		} else if (!strncasecmp (charset, "x-", 2)) {
-			/* we're not even going to bother trying to convert, so set the
-			   rawtext bit to TRUE and let the mailer deal with it. */
-			dw->rawtext = TRUE;
-		} else if (!strcasecmp (charset, "utf-8") && buffer->len) {
-			/* check that it is valid utf8 */
-			dw->rawtext = !g_utf8_validate (buffer->data, buffer->len, NULL);
-		}
-	}
-	
 	d(printf("message part kept in memory!\n"));
 	
-	mem = camel_stream_mem_new_with_byte_array(buffer);
-	camel_data_wrapper_construct_from_stream(dw, mem);
-	camel_object_unref((CamelObject *)mem);
-
-	camel_mime_parser_filter_remove(mp, decid);
-	camel_mime_parser_filter_remove(mp, crlfid);
+	mem = camel_stream_mem_new_with_byte_array (buffer);
+	camel_data_wrapper_construct_from_stream (dw, mem);
+	camel_object_unref (mem);
 	
-	if (fdec)
-		camel_object_unref((CamelObject *)fdec);
-	if (fcrlf)
-		camel_object_unref((CamelObject *)fcrlf);
+	if (encoding) {
+		dw->encoding = camel_mime_part_encoding_from_string (encoding);
+		g_free (encoding);
+	}
 }
 
 /* This replaces the data wrapper repository ... and/or could be replaced by it? */
@@ -424,7 +94,7 @@ camel_mime_part_construct_content_from_parser (CamelMimePart *dw, CamelMimeParse
 	CamelContentType *ct;
 	
 	ct = camel_mime_parser_content_type (mp);
-
+	
 	switch (camel_mime_parser_state (mp)) {
 	case HSCAN_HEADER:
 		d(printf("Creating body part\n"));
@@ -457,19 +127,11 @@ camel_mime_part_construct_content_from_parser (CamelMimePart *dw, CamelMimeParse
 	default:
 		g_warning("Invalid state encountered???: %d", camel_mime_parser_state (mp));
 	}
+	
 	if (content) {
 		/* would you believe you have to set this BEFORE you set the content object???  oh my god !!!! */
 		camel_data_wrapper_set_mime_type_field (content, camel_mime_part_get_content_type (dw));
 		camel_medium_set_content_object ((CamelMedium *)dw, content);
-		
-		/* Note: we don't set ct as the content-object's mime-type above because
-		 * camel_medium_set_content_object() may re-write the Content-Type header
-		 * (see CamelMimePart::set_content_object) if we did that (which is a Bad Thing).
-		 * However, if we set it *afterward*, we can still use any special auto-detections
-		 * that we found in simple_data_wrapper_construct_from_parser(). This is important
-		 * later when we go to render the MIME parts in mail-format.c */
-		camel_data_wrapper_set_mime_type_field (content, ct);
-		
 		camel_object_unref (content);
 	}
 }
author	Jeffrey Stedfast <fejj@ximian.com>	2003-08-12 01:57:45 +0800
committer	Jeffrey Stedfast <fejj@src.gnome.org>	2003-08-12 01:57:45 +0800
commit	b328a21e7c026aaa9cdd5e332ed7e39e0003d8eb (patch)
tree	2c0f3caac1a793197a951958fa7f07cdbf61ac07 /camel/camel-mime-part-utils.c
parent	7b1013be730b11384d0e0af340758bdef3f00330 (diff)
download	gsoc2013-evolution-b328a21e7c026aaa9cdd5e332ed7e39e0003d8eb.tar.gz gsoc2013-evolution-b328a21e7c026aaa9cdd5e332ed7e39e0003d8eb.tar.zst gsoc2013-evolution-b328a21e7c026aaa9cdd5e332ed7e39e0003d8eb.zip