DSL Ideas and Suggestions :: Damn Small Characters for Interlanguage Interchang



Just from my experience I don't think adding a few extra characters will work.  When I was using a Windows PC in an internet cafe in Keiv, it has a button labeled "ru" on the icon bar.  This button changed the keyboard and font to cryllic.  Hit the "en" button and it went back to english.  You could blend russian and english in the same Word document so if I was telling someone about a local restaurant I could type the name as it was.
Quote
Your use of "lingua franca" illustrates the issue

No, it doesn't and I didn't mean to set you off on another tangent. Users are dealing with this with extensions of native language fonts. Some have already been submitted, iirc.

If you have some extension to submit for MyDSL, please do. I don't think we need every rambling digression and obtuse angels-on-pins discussion about standards that weren't accepted in the process of your project. I don't think we need to adopt a new standard that -- if it really will require lots of "research" (as you'd written previously but seem to have since edited out) -- sounds too much like reinventing of wheels. Using it in the base as a default seems kind of stupid because the kludging that would likely be required to get every single application to work with a novel character set would take up more space than it's worth. We already have settled standards, we have NL fonts available. The apps already are compiled for those standards. We should use those instead of reinventing wheels.

Does this subject merit multiple threads and polls?

Quote
Just from my experience I don't think adding a few extra characters will work.

In the Linux/BSD/Unix/Solaris/OSX universe, there are myriad apps with myriad libraries those apps are compiled against. Most of those are based on set standards for languages. Introducing novel and non-standard character sets in such a meiotic environment introduces more confusion than developers need to deal with.

Hi newby:

I don't think the slight added convenience of tone marks (Chinese pinyin) and diacritics (Indic languages) justifies the major trouble of a new charset (research, testing, everyday use).

If 2 people want to communicate using phonetic representation, I think there is a much simpler solution:

In the case of Chinese, the user can just use numbers (as in "1, 2, 3, 4") to represent tone:
http://en.wikipedia.org/wiki/Pinyin#Numbers_in_place_of_tone_marks
This way, 2 people can communicate in Chinese using pinyin, without the aid of a Chinese font.
This is easier compared to using tone marks. When using tone marks, the user must know where (which vowel) to place the tone mark.
This is not required when using numbers:
e.g.
Wo3 shi4 zhong1 wen2 xue2 sheng1.

Similarly, 2 people can communicate in Japanese using Romaji, without the aid of a Japanese font.

Indic languages might be harder to transliterate/romanize:
Quote
[newby:]
The dravidian languages (Bengali, Hindi, et cetera) present the greatest difficulty due to a great number of diacritical and other marks.
Idea: There are ascii-only transliteration schemes for Bengali/Hindu/Tamil. But because I don't know any of these languages at all, I don't know whether Bengali/Hindu/Tamil speakers prefer the official schemes (with diacritics), or if they prefer the ascii-only schemes (more user-friendly??).

If I understand correctly, your goal is:
To input [any-language] characters using phonetic representation -- i.e. actual input, rather than transliteration/romanization.

If so, I think the existing combination of pinyin input methods (e.g. SCIM, and others) + Unicode/locale-specific font(s) is the only solution. (Your proposed new charset/font can be used to input other languages, but it cannot display them.)

For romanization: I think we should use ascii-only methods, rather than create a new charset. In the case of Indic languages (Bengali, Hindu, Tamil, etc), I don't know those languages at all; but in my uninformed opinion, I think the proposed new charset will be of very little use.
e.g.
If 2 people want to communicate in Tamil:
-- If no Tamil font is installed, then they would use the most compatible way of transliterating Tamil (ascii??). If so, they would not want to use an obscure charset...?
-- If a Tamil font is installed, then the proposed new font would be redundant?

Quote (stupid_idiot @ Mar. 13 2008,13:06)
Hi newby:

I don't think the slight added convenience of tone marks (Chinese pinyin) and diacritics (Indic languages) justifies the major trouble of a new charset (research, testing, everyday use).

If 2 people want to communicate using phonetic representation, I think there is a much simpler solution:

In the case of Chinese, the user can just use numbers (as in "1, 2, 3, 4") to represent tone:
http://en.wikipedia.org/wiki/Pinyin#Numbers_in_place_of_tone_marks
This way, 2 people can communicate in Chinese using pinyin, without the aid of a Chinese font.
This is easier compared to using tone marks. When using tone marks, the user must know where (which vowel) to place the tone mark.
This is not required when using numbers:
e.g.
Wo3 shi4 zhong1 wen2 xue2 sheng1.

Similarly, 2 people can communicate in Japanese using Romaji, without the aid of a Japanese font.

Indic languages might be harder to transliterate/romanize:
Quote
[newby:]
The dravidian languages (Bengali, Hindi, et cetera) present the greatest difficulty due to a great number of diacritical and other marks.
Idea: There are ascii-only transliteration schemes for Bengali/Hindu/Tamil. But because I don't know any of these languages at all, I don't know whether Bengali/Hindu/Tamil speakers prefer the official schemes (with diacritics), or if they prefer the ascii-only schemes (more user-friendly??).

If I understand correctly, your goal is:
To input [any-language] characters using phonetic representation -- i.e. actual input, rather than transliteration/romanization.

If so, I think the existing combination of pinyin input methods (e.g. SCIM, and others) + Unicode/locale-specific font(s) is the only solution. (Your proposed new charset/font can be used to input other languages, but it cannot display them.)

For romanization: I think we should use ascii-only methods, rather than create a new charset. In the case of Indic languages (Bengali, Hindu, Tamil, etc), I don't know those languages at all; but in my uninformed opinion, I think the proposed new charset will be of very little use.
e.g.
If 2 people want to communicate in Tamil:
-- If no Tamil font is installed, then they would use the most compatible way of transliterating Tamil (ascii??). If so, they would not want to use an obscure charset...?
-- If a Tamil font is installed, then the proposed new font would be redundant?

"stupid_idiot",

I put you login in quotes, since it is obviously intended to be ironic, coming from one who responds so intelligently.

Yes, numbers can be used to indicate tones, that being  Jyutping, Yale and Cantonese-Pinyin practice.  I suspect that many find the 4 tone marks in bejinghua to be more intuitive, since they are pictures of how the tones rise and fall.  Personally, I would struggle through deciphering accented Pinyin, but would pass on the numeric notation unless I had a very strong motivation otherwise.

As for my intention,  I'm focused on the first impression, a-la the first impression one receives when meeting another person. It is to make DSL more universally useful, without bloat.

Universally useful is similar my standard test for new software.  Boot it up and see if I can use it.  If it is intuitive enough that I can do something useful, then I will read the "friendly" manual.  I'd like to see a majority of people be able to boot DSL and use it well enough to seek further help and recieve it.

Without bloat, to me, means an initial, small-footprint means of access that doesn't unduly burden the 50MB footprint.  One 8-bit font, compared to N unicode fonts seems to be more in keeping with the DSL style.

Perhaps just including the Indic glyphs in the 128 - 255 range would be enough.  I'm still looking at it.

What would be great is a system that allows someone to change languages and keyboards with a click of a button, as another poster mentioned.  The $64,000 question, "Is that feasable within a 50MB footprint?"  John and Robert are the ones to answer that question, and will wisely ignore it until there is something real to assess.

Next Page...
original here.