Hi! Looking at the /var/mail/<someuser> file I found that subject fields are often in a wrong encoding while every other part of the message displayed correctly. Example: Subject: test =?koi8-r?Q?=D4=C5=D3=D4?= Is there some settings in sendmail configuration ti prevent this behavior? My system is: 2.6.32.21-168.fc12.i686 sendmail-8.14.4-3.fc12.i686 sendmail-cf-8.14.4-3.fc12.noarch
Hiisi wrote:
Looking at the /var/mail/<someuser> file I found that subject fields are often in a wrong encoding while every other part of the message displayed correctly. Example: Subject: test =?koi8-r?Q?=D4=C5=D3=D4?= Is there some settings in sendmail configuration ti prevent this behavior?
Why do you say this is incorrect? It looks like a properly encoded mail header that uses non-ascii characters, per RFC 2047.
On Fri, 2010-10-29 at 16:51 +0400, Hiisi wrote:
Looking at the /var/mail/<someuser> file I found that subject fields are often in a wrong encoding while every other part of the message displayed correctly. Example: Subject: test =?koi8-r?Q?=D4=C5=D3=D4?= Is there some settings in sendmail configuration ti prevent this behavior? My system is: 2.6.32.21-168.fc12.i686 sendmail-8.14.4-3.fc12.i686 sendmail-cf-8.14.4-3.fc12.noarch
What can be done in the headers (such as the subject line) is quite limited. It has to go through, unscathed, various different mail servers. There isn't really any meta-subject data to be able to describe how the subject will be encoded. To do something non-ascii, you have to bodge everything into the subject line. Clients that can decode it will show what was intended to be seen; clients that cannot, will show all the instructions as the instructions. It's probably no more than assumption that UTF-8 in the subject line will be correctly interpreted.
The message content is another matter, entirely. It will have headers that describe what the content is, those headers will be simple, but the content could be anything, so long as it's encoded so that the encoded stuff will pass through a 7-bit system.
pe, 2010-10-29 kello 09:08 -0400, Todd Zullinger kirjoitti: <--SNIP-->
Why do you say this is incorrect? It looks like a properly encoded mail header that uses non-ascii characters, per RFC 2047.
I use this command in a bash script: cat |grep Subject:|sed 's/Subject: //g'>$OUTFILE The script is invoked by procmail. The $OUTFILE consists of something like that: =?utf-8?B?0YLQtdC80LAg0L/QuNGB0YzQvNCw?= I would like the information in it to be readable. $OUTFILE encoding is 7bit ASCII characters. Converting it to different encoding using iconv does not make any difference. If subject encoding is right, how can I recode it to something different? TIA
Hiisi wrote:
pe, 2010-10-29 kello 09:08 -0400, Todd Zullinger kirjoitti: <--SNIP--> I use this command in a bash script: cat |grep Subject:|sed 's/Subject: //g'>$OUTFILE
(UUOC there? ;)
The script is invoked by procmail. The $OUTFILE consists of something like that: =?utf-8?B?0YLQtdC80LAg0L/QuNGB0YzQvNCw?= I would like the information in it to be readable. $OUTFILE encoding is 7bit ASCII characters. Converting it to different encoding using iconv does not make any difference. If subject encoding is right, how can I recode it to something different?
You need to use a tool than understands RFC 2047 and can decode the headers. I'd use python¹ to do this, but that's just my preference. Many languages should be able to do the job. Just not a simple cat, grep, and sed (which, btw, you could replace with one call to awk ;).
¹ http://docs.python.org/library/email.header.html
Hiisi wrote:
pe, 2010-10-29 kello 09:08 -0400, Todd Zullinger kirjoitti: <--SNIP-->
Why do you say this is incorrect? It looks like a properly encoded mail header that uses non-ascii characters, per RFC 2047.
I use this command in a bash script: cat |grep Subject:|sed 's/Subject: //g'>$OUTFILE The script is invoked by procmail. The $OUTFILE consists of something like that: =?utf-8?B?0YLQtdC80LAg0L/QuNGB0YzQvNCw?= I would like the information in it to be readable. $OUTFILE encoding is 7bit ASCII characters. Converting it to different encoding using iconv does not make any difference. If subject encoding is right, how can I recode it to something different? TIA
You can use "reformime" from maildrop package (present in Fedora):
reformime -c UTF-8 -h "header" reformime -c UTF-8 -h "$(grep -i '^Subject:' MailFile)" # Header keys are case-insensitive, thus "-i"
Frantisek Hanzlík
Frantisek Hanzlik wrote:
You can use "reformime" from maildrop package (present in Fedora):
reformime -c UTF-8 -h "header" reformime -c UTF-8 -h "$(grep -i '^Subject:' MailFile)" # Header keys are case-insensitive, thus "-i"
Nice. I thought about formail, but when I did a quick test I found it didn't decode the headers. The reformime utility sounds like a good tool to keep in mind. Thanks!
pe, 2010-10-29 kello 15:28 -0400, Todd Zullinger kirjoitti:
Frantisek Hanzlik wrote:
<--SNIP-->
Nice. I thought about formail, but when I did a quick test I found it didn't decode the headers. The reformime utility sounds like a good tool to keep in mind. Thanks!
Thank you for clarification, Tim, Todd and Frantisek. reformime solution works perfectly. As for this script, yes, I know... It would be better to use something more powerful and with better productivity (i.e. python). Unfortunately my python skills are limited :-( I only need a temporary solution for one month (appr.). But you never know, and often temporary solutions became constant! P.S. 2Todd (Useless Use of Cat Award): You've caught me!