Community:
I have @330 htm pages that display wonderfully on Win XP under 4.01 Strict. No errors per w3C validator.
They won't even come close to proper display on Fedora 14. I can get them validated successfully on F14 through w3c Validator, but I am seeing error console reports in Firefox about "such-and-such function is not defined". The weird part is that it only picks out selective functions to not find in the *.js script, other functions in the *.js script do not generate errors.
I did a search in Bugzilla for Firefox and multiple permutations of what I thought the error was and didn't see anything.
I am looking for suggestions as to where to start digging on this one as I don't have anything worth considering to be a bug at this point. I am hard-pressed to believe that Windows XP is okay and Fedora is not on the same *.htm" page ... but that's all I can see at this point.
Thanks in advance, Paul
ps: the reason there is no attachment with this query is that I don't have anything clean enough to present and, if an initial query to this list doesn't work, I will try to do surgery to come up with an example
2011/8/8, Paul Allen Newell pnewell@cs.cmu.edu:
Community:
I have @330 htm pages that display wonderfully on Win XP under 4.01 Strict. No errors per w3C validator.
They won't even come close to proper display on Fedora 14. I can get them validated successfully on F14 through w3c Validator, but I am seeing error console reports in Firefox about "such-and-such function is not defined". The weird part is that it only picks out selective functions to not find in the *.js script, other functions in the *.js script do not generate errors.
w3c's html validator is unlikely to signal problems with your javascript (and no validator could if the problem is not a syntactic one).
I did a search in Bugzilla for Firefox and multiple permutations of what I thought the error was and didn't see anything.
I am looking for suggestions as to where to start digging on this one as I don't have anything worth considering to be a bug at this point. I am hard-pressed to believe that Windows XP is okay and Fedora is not on the same *.htm" page ... but that's all I can see at this point.
It's probably not a Win XP vs Fedora but an IE vs Firefox question. Have you tried FF on Win XP? Or other browsers on Fedora?
Andras
Paul Allen Newell writes:
Community:
I have @330 htm pages that display wonderfully on Win XP under 4.01 Strict. No errors per w3C validator.
They won't even come close to proper display on Fedora 14. I can get them validated successfully on F14 through w3c Validator, but I am seeing error console reports in Firefox about "such-and-such function is not defined". The weird part is that it only picks out selective functions to not find in the *.js script, other functions in the *.js script do not generate errors.
W3C does not validate Javascript, only HTML.
If you have Javascript that only works on MSIE, then that's what you have: Javascript that only works on MSIE.
I did a search in Bugzilla for Firefox and multiple permutations of what I thought the error was and didn't see anything.
I am looking for suggestions as to where to start digging on this one as
Start digging in your Javascript code. The fact that Firefox is complaining about various Javascript functions is you big, honking clue.
On Mon, 2011-08-08 at 09:42 +0200, Andras Simon wrote:
w3c's html validator is unlikely to signal problems with your javascript (and no validator could if the problem is not a syntactic one).
I'll go further and say that it won't. It's not just unlikely. It looks at HTML not JavaScript. It might check whether you've called JavaScript in a syntactically correct way (e.g. that you've put your OnMouse-whatever's into the right part of the HTML elements), but not the functions that are in your JavaScript.
Their HTML validator checks HTML. Their CSS validator checks CSS. They don't have a validator for JavaScript, and I've not heard of anyone that does (which goes some way to explaining the huge amount of crap JavaScript on the WWW that just doesn't work in my browser - because there is no standard test for JavaScript, and authors just dream up whatever seems to work on the browser they're playing with). There are standardised ECMA scripts, but browsers do their own thing with their own scripting, and authors are still stuck playing that silly game of having to code differently for specific browsers.
Additionally, if the original poster wants more eyes looking at their problem, they really need to supply some samples of the problems.
As others have said, it's most likely a browser issue. JavaScript nearly always is (that, or an authoring error). There are news groups that deal with web authoring that might be your best bet, but put on your flameproof suit, they'll be far more critical than I've been.
On 8/8/2011 12:42 AM, Andras Simon wrote:
w3c's html validator is unlikely to signal problems with your javascript (and no validator could if the problem is not a syntactic one).
Andras:
Thanks for reply.
Regarding the validator, your comment was/is understood before I wrote my email ... I mentioned it only to ensure that I wasn't tripping up on bad html that validator would pick up. Using 4.01 Strict, if it matters.
It's probably not a Win XP vs Fedora but an IE vs Firefox question. Have you tried FF on Win XP? Or other browsers on Fedora?
Andras
Everything is in Firefox on Win XP and F14 (don't want cliched apples and oranges problem by dealing with IE). Both systems are running Firefox 3.6.18 (Windows XP is 32bit, F14 is both 32 and 64).
Paul
On 8/8/2011 4:12 AM, Sam Varshavchik wrote:
Start digging in your Javascript code. The fact that Firefox is complaining about various Javascript functions is you big, honking clue.
Sam:
Thanks for reply.
As mentioned in a prior response to Andras, since WinXP and F14 are both using Firefox 3.6.18, I am still looking for the big honking clue about the difference.
I am getting prepared to create a simple test case with the hopes that the problem will show itself as I strip down the actual html / javascript (I was really hoping there was going to be a "oh, you need to such-and-such" suggestion ... I figured it was worth the try asking)
Paul
On 8/8/2011 4:19 AM, Tim wrote:
. As others have said, it's most likely a browser issue. JavaScript nearly always is (that, or an authoring error). There are news groups that deal with web authoring that might be your best bet, but put on your flameproof suit, they'll be far more critical than I've been.
Tim:
Thanks for reply ... I have the flameproof suit ready when needed (plus plate and silverware if eating crow is necessary)
Paul
Something is still
On Tue, Aug 9, 2011 at 6:37 AM, Paul Allen Newell pnewell@cs.cmu.edu wrote:
On 8/8/2011 12:42 AM, Andras Simon wrote:
w3c's html validator is unlikely to signal problems with your javascript (and no validator could if the problem is not a syntactic one).
Andras:
Thanks for reply.
Regarding the validator, your comment was/is understood before I wrote my email ...
Not quite, perhaps.
I mentioned it only to ensure that I wasn't tripping up on bad html that validator would pick up. Using 4.01 Strict, if it matters.
Well, outside the fact that even strictly standard html provides hooks for conformant ways to add non-conformant tags, yeah, conformance matters.
The web standards have been open-ended from the beginning on purpose. It's a kind of hidden sub-text in the discussions, one of those proverbial elephants in the room. (Confused me for a long time, too.)
It's probably not a Win XP vs Fedora but an IE vs Firefox question. Have you tried FF on Win XP? Or other browsers on Fedora?
Andras
Everything is in Firefox on Win XP and F14 (don't want cliched apples and oranges problem by dealing with IE). Both systems are running Firefox 3.6.18 (Windows XP is 32bit, F14 is both 32 and 64).
Paul
Direct-X?
Even two distinct installs of Fedora 14 are likely to have distinct sets of libraries installed, and the java/ECMAscript interface to the OS libraries is a bit fuzzy.
It goes without saying that you must have checked that you have the same set of add-ons loaded in each. Right?
Shoot. Without a look at your source code, I would be hard-pressed to even suggest a proper forum for you among those that are dedicated to the various ways to mix HTML, CSS, ECMAscript, server-side tech, and so forth.
Joel Res
Joel:
Thanks for reply ... my answers(?) inline
On 8/8/2011 3:30 PM, Joel Rees wrote:
Regarding the validator, your comment was/is understood before I wrote my email ...
Not quite, perhaps.
I am prepared to discover my understanding is not as good as I thought it was (smile)
Well, outside the fact that even strictly standard html provides hooks for conformant ways to add non-conformant tags, yeah, conformance matters.
The web standards have been open-ended from the beginning on purpose. It's a kind of hidden sub-text in the discussions, one of those proverbial elephants in the room. (Confused me for a long time, too.)
I am being very careful to not include outside material and/or non-conformant tags. Back and forth to the w3c docs to make sure I am doing things correct. I might have missed something ... and that may be the honking clue I am looking for ... but don't see it yet
Direct-X?
Oh, groan, never thought of that ...
Even two distinct installs of Fedora 14 are likely to have distinct sets of libraries installed, and the java/ECMAscript interface to the OS libraries is a bit fuzzy.
It goes without saying that you must have checked that you have the same set of add-ons loaded in each. Right?
Remember I mentioned having the plate and silverware ready for potential eating of crow ... well, of course I didn't check that and I am preparing the crow.
Looking at WinXP, it has extensions Java Console 6.0.26 and Java Quick Starter 1.0. It has plug-ins Java Deployment Toolkit 6.0.260.3 and Java (tm) Platform SE6 U26 6.0.260.3. The Linux has nothing with the word Java in it. Can I at least cook the crow or is this one that requires "raw with feathers"?
I gotta dig to figure out what these items are and which are needed ... I have this memory many years ago of accepting a java something for XP Firefox and not thinking anything more about it
Shoot. Without a look at your source code, I would be hard-pressed to even suggest a proper forum for you among those that are dedicated to the various ways to mix HTML, CSS, ECMAscript, server-side tech, and so forth.
As I mentioned before, I sent the original email to see if I missed something obvious (and I think that paid off with your reply). My next step was to create a test html/javascript to duplicate by reduction of original (and make sure that I wasn't doing something dumb in my original).
Let me try looking at the add-ons and, if that doesn't do it, I will create the example.
Joel Res
Paul Allen Newell writes:
On 8/8/2011 4:12 AM, Sam Varshavchik wrote:
Start digging in your Javascript code. The fact that Firefox is complaining about various Javascript functions is you big, honking clue.
Sam:
Thanks for reply.
As mentioned in a prior response to Andras, since WinXP and F14 are both using Firefox 3.6.18, I am still looking for the big honking clue about the difference.
The browsers are probably exposing some OS specific resources to Javascript. When the Javascript can't find something OS specific that it's looking for, it dies.
I am getting prepared to create a simple test case with the hopes that the problem will show itself as I strip down the actual html / javascript (I was really hoping there was going to be a "oh, you need to such-and-such" suggestion ... I figured it was worth the try asking)
Paul
-- users mailing list users@lists.fedoraproject.org To unsubscribe or change subscription options: https://admin.fedoraproject.org/mailman/listinfo/users Guidelines: http://fedoraproject.org/wiki/Mailing_list_guidelines
On Tue, Aug 9, 2011 at 7:51 AM, Paul Allen Newell pnewell@cs.cmu.edu wrote:
Joel:
[...] As I mentioned before, I sent the original email to see if I missed something obvious (and I think that paid off with your reply). My next step was to create a test html/javascript to duplicate by reduction of original (and make sure that I wasn't doing something dumb in my original).
Let me try looking at the add-ons and, if that doesn't do it, I will create the example.
Joel Res
Source code tastes much better than crow, anyway. 8-*
You're on the right track. Putting the examples together will probably tell you which forum is best to ask at, if it doesn't pinpoint the problems.
One point you might need to drag back out of the past, in spite of the similarity in names, Java and Java/ECMAscript have only the barest of historical connections. (Passed each other in a storm many years ago.:^/)
Joel Rees
On Mon, 2011-08-08 at 19:01 -0400, Sam Varshavchik wrote:
The browsers are probably exposing some OS specific resources to Javascript. When the Javascript can't find something OS specific that it's looking for, it dies.
Hence why relying on it is nearly always a bad idea. Sure, there's some very basic commands that are probably going to work on most browsers. But there's plenty of things that are only going to work on some browsers, and they'll probably be the ones that that the author will try to use (because that's how *luck* works).
Having said what I said earlier, about not knowing of any JavaScript validators. I noticed some listed when Google searching for them, afterwards. But I know nothing about whether they're reliable, in themselves. And it's certainly not going to help when it comes to issuing commands that can only work in some browsers.
The HTML war was fought long ago, and eventually came through with more people willing to adhere to specifications. Though it's taken a hell of a lot of convincing. And the WWW is more compatible with most things now, than it was.
The JavaScript bitch fight has continued as scrum. Yes, some standards were written long ago. But JavaScript was a proprietary baby. The non-proprietary ECMA script /standard/ is barely mentioned (and lack of proper validators hasn't helped). And browser writers have always shovelled in there own special tricks, even more so with scripting than they did with flat HTML. Not to mention expansion of features with plug-ins (which some page authors just don't get that all plug-ins are not available for all browsers, nor can some people install them, even if available).
You need to learn about different types of browsers. For instance, you're not too likely to be able to pop up other windows on a browser running in a mobile phone. So coding up a convoluted site limits how it can be used by the public, perhaps making it completely unusable.
For anybody dabbling with scripting, I'd advise trying to find out about compatibilities (what's common, what's browser specific). And it'll be almost unavoidable that you'll be doing lots of conditional scripting for different browsers, because you'll probably want some of those browser specific functions, having to work out two or three different ways to do what you wanted to do.
You also lose out with search engines, if you depend on scripting to get through your site. Most people find out about a site through a search engine, it's not good to exclude yourself from that working.
I gave up attempting to work through the scripting nightmare / browser war, long ago. My websites have no scripting.
On 8/8/2011 5:56 PM, Tim wrote:
For anybody dabbling with scripting, I'd advise trying to find out about compatibilities (what's common, what's browser specific).
First, my thanks to everyone who offered help / suggestions.
After checking the add-ons suggestions, I came to the conclusion that it was a red herring as that's all about java / openJDK / icedtea. Simple javascripts worked and F14 Firefox has preferences->content->EnableJavaScript checked on.
So, with a groan, I started stripping down my html / javascript to get an example. And, to once again borrow the language of one the replies, I suddenly heard the loudest honking possible.
It a stupid typo that I usually know to check for given these web pages were written a long time ago on Windows 98se ... and I am stalling on admitting it as I feel very stupid for posting before I caught it.
One of the f-ing scripts was still assuming case-insensitivity for directory locations (shaking my head as I say this). How I missed this one and caught all the other Windows case-insensitive legacies is nothing short of the classic tree / forest blindness. I've even written a Python script to scan all scripts / code / data to try to root out every one ... I just missed one case (I was too clever for my own good in the 1990's when I wrote the html/javascript)
My apologies to all and, once again, thanks for all the tips !!! Paul
On Mon, 2011-08-08 at 18:15 -0700, Paul Allen Newell wrote:
assuming case-insensitivity
With the Apache web server, there is a "helpful" option that does work around some spelling/typing errors, doing its best to try and do what you meant, rather than what you actually did (changing case, and working around one or two mistyped letters, if there's a fairly obvious match). Though, I tend to avoid it, as it leads to future problems - such as uploading your site onto another server that doesn't have that feature.
Some time ago I made a policy decision that all URIs, and associated file names, should avoid the use of the shift key, completely. All letters will be lower case, and any punctuation limited to things that didn't, normally, need the shift key, and people won't mistype because they didn't know what it was (no _ underscores, all dashes are single hyphens, and using the same hyphen between words you'd, otherwise, space apart). Also, to try and use words in addresses that don't usually need spelling or explaining to people (they should be easy to dictate over the phone). Hyphenating, instead of running words together, helps to avoid some problems where people can't read a word, or you inadvertently create a rude, or otherwise objectionable, word.
It's what you might call a "lazy typing" rule. Both for me, and for anybody else who might have to type in a URI. Most people will type everything in all lower case, and it's easier to read out a URI if you can state the address, and say "all lower case," rather than try to say how to type in some CamelCaseTyped portion of an address.
Once you've set yourself a rule, it's easier to be consistent throughout your site. Particularly if you put enough thought into it, ahead of time, that you won't need to break it.
On 8/9/2011 6:17 AM, Tim wrote:
On Mon, 2011-08-08 at 18:15 -0700, Paul Allen Newell wrote:
assuming case-insensitivity
[...]
Once you've set yourself a rule, it's easier to be consistent throughout your site. Particularly if you put enough thought into it, ahead of time, that you won't need to break it.
Tim:
Over the decades I've certainly learned the value of rules. That being said, the rules evolve as I learn more and more. And then there are those which weren't thought out, such as case-sensitivity differences between Windows and Linux. I made the mistake of assuming that I'd never be migrating code between the two and was sloppy on Windows as I focused on dealing with 8.3 naming. Retrofitting, just like porting code, does have its few groans of "I missed that one".
I do try to have rules which work globally for html, C++, Python, text, etc. rather than different rules for different uses. Obviously, can't do this all the time (and in some cases, rarely does it hold any time), but it strikes me better to first try to view top down.
Yours is a good rule, though I accept the shift key as a fact of life (this letter is evidence of that; otherwise, capitalization would be lost). I prefer underbar to hyphen for space as hyphen is a legit character, but I can see that such is just a matter of personal choice. I've worked in enough places whose C++ coding standards are camelBack that it is second nature to me (I only started using it because I had to ... you are right about the problems with what can be created if camelBack is applied without a good proof-read).
I wish the Chicago Manual of Style would weigh in on url name conventions (not to mention typography in code). Not that I'd agree with them, but it would be a good starting point.
Thanks, Paul
On Wed, Aug 10, 2011 at 5:53 AM, Paul Allen Newell pnewell@cs.cmu.edu wrote:
[...] I wish the Chicago Manual of Style would weigh in on url name conventions (not to mention typography in code). Not that I'd agree with them, but it would be a good starting point.
One of the problems with style relative to urls is that urls are intended to be human readable in an international context.
The Chicago Manual of Style is USA-English centric.
urls themselves were invented within the same large linguistic context, and in spite of intent, reflect the context. Case sensitivity as a distinguishing/non-distinguishing factor in names is a case in point, where mapping features of one language/culture to another does not produce mechanical equivalence. Tell Toto we're not in Kansas any more, etc.
I guess what I'm trying to say is that we shouldn't be surprised by code that used to work and doesn't any more, nor should we be surprised by seemingly trivial errors being the blockers. Especially when they used not to really be errors.
Joel Rees
On 8/9/2011 5:10 PM, Joel Rees wrote:
On Wed, Aug 10, 2011 at 5:53 AM, Paul Allen Newellpnewell@cs.cmu.edu wrote:
[...] I wish the Chicago Manual of Style would weigh in on url name conventions (not to mention typography in code). Not that I'd agree with them, but it would be a good starting point.
One of the problems with style relative to urls is that urls are intended to be human readable in an international context.
The Chicago Manual of Style is USA-English centric.
urls themselves were invented within the same large linguistic context, and in spite of intent, reflect the context. Case sensitivity as a distinguishing/non-distinguishing factor in names is a case in point, where mapping features of one language/culture to another does not produce mechanical equivalence. Tell Toto we're not in Kansas any more, etc.
Two points (that being a language/cultural mapping feature in and of itself).
I don't have a good reply ... my "wish" only holds valid within a particular localized language context. To what extent cultural issues are part of or separate to that "language context" is way outside my domain expertise.
As to internationalization of URL, I did some reading on IRI and I have no idea how I would come up with "rules" to handle such. I do not have a website so my usage is limited to my system(s) for data that needs links et al ... with all due respect for those having to deal with such, it doesn't seem like a problem I need to solve. Please don't take this as a USA-centric dismissal of the need for a global solution(s), its more like only going to the hardware store to buy the tools you need rather than buying every tool in case you might need it.
I guess what I'm trying to say is that we shouldn't be surprised by code that used to work and doesn't any more, nor should we be surprised by seemingly trivial errors being the blockers. Especially when they used not to really be errors.
Joel Rees
Agreed.
Thanks for the food for thought, Paul
On Tue, 2011-08-09 at 13:53 -0700, Paul Allen Newell wrote:
Yours is a good rule, though I accept the shift key as a fact of life (this letter is evidence of that; otherwise, capitalization would be lost). I prefer underbar to hyphen for space as hyphen is a legit character, but I can see that such is just a matter of personal choice.
The shift key is a necessary evil, but it's beyond the comprehension of some people. They don't understand how to type various symbols, or can even recognise what some of them are. And just look at the emails you'll see that are written in all lower case, or with the caps lock stuck on. Leading to another problem - some people can't comprehend the difference between caps lock and shift.
So my rule was to make it easy, very easy, for everyone else. Make it easy to say over the phone or radio, without needing complex instructions. That also goes down to choosing the words that you use carefully, too (spelling, familiar words, not jamming words together which double-up letters in confusing places, as far as typing is concerned - consider sports-stars versus sportsstars).
I used to use the underscore, as it made sense (to me, and other programmers) as a substitute for a space. But there's two drawbacks:
1. Try explaining to the clueless what an underscore is, and how to type it. Try doing that again and again, and you get real sick of it.
2. You have the messy combinations of punctuation such as:
Shakespeare_-_The_Taming_of_the_Shrew
Where it'd really be better to collapse all punctuation down to just one punctuation symbol. That's "better" as in "easier and more convenient," not more lexically correct. Remember these are URIs (i.e. codes), not general language.
3. If you ever want a URI printed on a newspaper or magazine, whoever types it may not be able to get an underscore into the text, unless they're familiar with how their publishing system works. And, even then, they may fail. Many of them will convert an underscore into an EM dash, since an underscore is hardly ever desired in print, yet proper dashes are wanted all the time.
When it comes to the web, people (mostly) find you through search engines, where they don't have to type URIs, but do have to be able to think of the keywords that will find you. Or, you have to think of the keywords that people might use while trying to find your product or information. Then there's personal referrals, and you want such addresses to be typed error-free. And verbal referrals, be that person-to-person, or broadcast, where such URIs need be as close to phonetic as you can manage, so it's said error-free, and listeners can type it error-free by what they presume they heard.
On 08/10/2011 08:37 AM, Tim wrote:
The shift key is a necessary evil, but it's beyond the comprehension of some people. They don't understand how to type various symbols, or can even recognise what some of them are. And just look at the emails you'll see that are written in all lower case, or with the caps lock stuck on. Leading to another problem - some people can't comprehend the difference between caps lock and shift.
The content of their mail is probably not too interesting. :-)
I mean, begin unable to type proper upper/lowercase and punctuation is well correlated with having confused or trivial ideas.
In that sense, a mail address such as _0oO-i_1lL@example.com could be considered a sort of spam defense... :-)
(I hate being politically correct)
Roberto Ragusa mail@robertoragusa.it wrote:
On 08/10/2011 08:37 AM, Tim wrote:
The shift key is a necessary evil, but it's beyond the comprehension
of
some people. They don't understand how to type various symbols, or
can
even recognise what some of them are. And just look at the emails you'll see that are written in all lower case, or with the caps lock stuck on. Leading to another problem - some people can't comprehend
the
difference between caps lock and shift.
The content of their mail is probably not too interesting. :-)
I mean, begin unable to type proper upper/lowercase and punctuation is well correlated with having confused or trivial ideas.
In that sense, a mail address such as _0oO-i_1lL@example.com could be considered a sort of spam defense... :-)
(I hate being politically correct)
Does that include giving a certain amount of leeway to folks whose first written language doesn't consist of an "alphabet"? :-)
Tim wrote:
I used to use the underscore, as it made sense (to me, and other programmers) as a substitute for a space. But there's two drawbacks:
- Try explaining to the clueless what an underscore is, and how to
type it. Try doing that again and again, and you get real sick of it.
You have the messy combinations of punctuation such as:
Shakespeare_-_The_Taming_of_the_ShrewWhere it'd really be better to collapse all punctuation down to just one punctuation symbol. That's "better" as in "easier and more convenient," not more lexically correct. Remember these are URIs (i.e. codes), not general language.
- If you ever want a URI printed on a newspaper or magazine, whoever
types it may not be able to get an underscore into the text, unless they're familiar with how their publishing system works. And, even then, they may fail. Many of them will convert an underscore into an EM dash, since an underscore is hardly ever desired in print, yet proper dashes are wanted all the time.
4. Host Names (or 'labels' in DNS jargon) as traditionally defined by RFC 952 and RFC 1123 may be composed of upper and lower case characters, numeric characters, and the dash character. RFC 2181 significantly liberalized the valid character set including the use of "_" (underscore), but it is still a *good idea* to stick to the traditionally defined characters[¹].
Mixing dashes and underscores in URLs is sloppy looking at best and confusing at worst, so using the dash because it is well-supported in host names is a good practice.
¹ http://www.zytrax.com/books/dns/apa/names.html
Regards,
Matthew Roth InterMedia Marketing Solutions Software Engineer and Systems Developer
On Wed, 2011-08-10 at 09:40 -0500, Matthew J. Roth wrote:
Tim wrote:
I used to use the underscore, as it made sense (to me, and other programmers) as a substitute for a space. But there's two drawbacks:
- Try explaining to the clueless what an underscore is, and how to
type it. Try doing that again and again, and you get real sick of it.
You have the messy combinations of punctuation such as:
Shakespeare_-_The_Taming_of_the_ShrewWhere it'd really be better to collapse all punctuation down to just one punctuation symbol. That's "better" as in "easier and more convenient," not more lexically correct. Remember these are URIs (i.e. codes), not general language.
- If you ever want a URI printed on a newspaper or magazine, whoever
types it may not be able to get an underscore into the text, unless they're familiar with how their publishing system works. And, even then, they may fail. Many of them will convert an underscore into an EM dash, since an underscore is hardly ever desired in print, yet proper dashes are wanted all the time.
- Host Names (or 'labels' in DNS jargon) as traditionally defined by
RFC 952 and RFC 1123 may be composed of upper and lower case characters, numeric characters, and the dash character. RFC 2181 significantly liberalized the valid character set including the use of "_" (underscore), but it is still a *good idea* to stick to the traditionally defined characters[¹].
It's become much worse than that with new classes of labels allowing non-ASCII character sets. See http://tools.ietf.org/html/rfc5890
poc
On Thu, Aug 11, 2011 at 1:46 AM, Patrick O'Callaghan pocallaghan@gmail.com wrote:
On Wed, 2011-08-10 at 09:40 -0500, Matthew J. Roth wrote:
Tim wrote:
I used to use the underscore, as it made sense (to me, and other programmers) as a substitute for a space. But there's two drawbacks:
- Try explaining to the clueless what an underscore is, and how to
type it. Try doing that again and again, and you get real sick of it.
- You have the messy combinations of punctuation such as:
Shakespeare_-_The_Taming_of_the_Shrew
Where it'd really be better to collapse all punctuation down to just one punctuation symbol. That's "better" as in "easier and more convenient," not more lexically correct. Remember these are URIs (i.e. codes), not general language.
- If you ever want a URI printed on a newspaper or magazine, whoever
types it may not be able to get an underscore into the text, unless they're familiar with how their publishing system works. And, even then, they may fail. Many of them will convert an underscore into an EM dash, since an underscore is hardly ever desired in print, yet proper dashes are wanted all the time.
- Host Names (or 'labels' in DNS jargon) as traditionally defined by
RFC 952 and RFC 1123 may be composed of upper and lower case characters, numeric characters, and the dash character. RFC 2181 significantly liberalized the valid character set including the use of "_" (underscore), but it is still a *good idea* to stick to the traditionally defined characters[¹].
It's become much worse than that with new classes of labels allowing non-ASCII character sets. See http://tools.ietf.org/html/rfc5890
Which speaks to the problems of context that I brought up earlier.
Ten years ago, Japanese people who used the internet could (more or less) read English, and Latinized (romaji) spellings of Japanese used in urls didn't cause many problems either.
These days, ordinary Japanese people use the internet, and the latin basic set urls are just as meaningless as telephone numbers to them. Less, perhaps. (Yeah, they get force-fed English in primary grades, but that doesn't mean it is even comfortable for them to "read" -- and comprehend -- new combinations of romaji.)
On the other hand, simply allowing Kanji to be used in urls is going to create as many problems as it solves. It would be almost easy to fold hiragana and katakana, but not even possible to fold kanji and kana. As a result, the ads you see in trains tend to show the katakana or hiragana for a company's name in a search box, with the search button being clicked.
As Paul points out, we should solve our problems in the local context first, since it's the one we best understand, and the one we probably need most to work in.
And then we try to figure out how to get things working in a broader context, and at some point we have to resort to a layer of translations (a human version of an API, perhaps?). And our minds tend to handle so much of this so well, that it's often a surprise how much detail you have to add to mechanical rules. And then there are problems that you just have to leave unsolved (and hope something works out), like the issues with Japanese in urls. And that's when there are no bugs.
(Sorry about the rant, but not sorry enough to refrain from posting it.)
Joel Rees
Lots of replies to read and digest ... thanks
Paul