«
DRX
Web Developer Resource Index: Character Encoding
Character Encoding is a method of representing human languages (characters, letters, digits, symbols, punctuation) using numbers stored at the machine level (in binary). Conceptually this idea predates computing substantially—consider Morse code for instance.
There’s an old joke: “the nice thing about standards is there are so many to choose from.” This is no joke when it comes to character sets. There are a dizzying array to list. The one most people, at least Americans, are familiar with, is ASCII, which is a very basic set consisting of 128 (7 bits) characters, 95 of which are printable. The others are so-called “control” characters.
While that’s fine for a simple language like English (simple at least in terms of the character set), but what about something like Japanese that has thousands of characters (and several sets)?
Recently a new standard encoding system has emerged that can represent both simple (and be backwardly compatible) and complex languages. This is Unicode, and for the Web to function on an international stage of players, its time is now.
Perhaps now someone would like to guess what language I’m using (and what the meaning is) for that silly Unicode button over in the left sidebar.
Updated: Wednesday, December 24th, 2008 @ 11:56 PM EST
Navigation
Resources
-
1.Unicode
Regardless of the language, the platform (computer architecture), the country or the software being used, Unicode offers a single, unique encoding (code point or numeric representation) for every possible character. [1131]
★★★★★
URI:http://unicode.org/
Author:Unicode Consortium [1]
Reviewed:Thursday, May 12th, 2005 @ 11:16 AM EDT
by:Douglas Clifton
Unicode charsetencodingunicode
-
2.Characters and Encoding
An incredible amount of detail and history here. The "Detailed descriptions of the characters" under "The ISO Latin 1 character repertoire" is particularly good, and includes full annotation of each character. [1136]
★★★★★
URI:http://www.cs.tut.fi/~jkorpela/chars/
Author:Jukka Korpela [1]
Reviewed:Thursday, May 12th, 2005 @ 12:29 PM EDT
by:Douglas Clifton
Characters and Encoding charsetencodingunicode
-
3.
HTML Entity Character LookupThere are countless character entity sites out there, and I sometimes forget which one(s) I prefer. This Firefox extension eliminates that issue. Simply open it from the Tools menu and start typing the name or code point and you get a list of matches. [1869]
★★★★☆
URI:http://www.yining.org/firefox/extensions/html-entity-char-lookup/
Author:Zhang Yining [1]
Reviewed:Wednesday, December 24th, 2008 @ 11:56 PM EST
by:Douglas Clifton
HTML Entity Character Lookup charsetencodingextensionfirefoxhtml
-
4.UTF-8: The Secret of Character Encoding
This very thorough tutorial is aimed at both the developer and the end-user. It covers some history, finding and fixing encodings, Unicode and UTF, HTTP headers and the HTML meta http-equiv tag, forms, configuring your database, fonts, and PHP functions. [1803]
★★★★☆
URI:http://htmlpurifier.org/docs/enduser-utf8.html
Author:Edward Z. Yang [2]
Reviewed:Thursday, November 6th, 2008 @ 12:25 AM EST
by:Douglas Clifton
UTF-8: The Secret of Character Encoding apachecharsetencodingfontshtmlhttpi18nmedia typesphpunicodeutf8
-
5.W3C Internationalization (I18n) Activity
The tutorial on character sets and encodings listed below is just the tip of the iceberg if you're interested in learning more about Internationalization (i18n) on the Web. News, articles, techniques, and more on the topic from the W3C are available here. [1784]
★★★★☆
URI:http://www.w3.org/International/
Author:I18n Core Working Group [2]
Reviewed:Wednesday, October 29th, 2008 @ 2:22 PM EDT
by:Douglas Clifton
W3C Internationalization (I18n) Activity charseti18ntutorialunicodeutf8w3c
-
6.PHP UTF-8 cheatsheet
LAMP developers who plan on creating sites that support Internationalization (i18n) should stop by this article first. Altering MySQL tables to use UTF-8, installing and configuring the PHP mbstring extension, and changes to their code are all covered. [1782]
★★★★☆
URI:http://www.nicknettleton.com/zine/php/php-utf-8-cheatsheet
Author:Nick Nettleton [1]
Reviewed:Wednesday, October 29th, 2008 @ 5:24 AM EDT
by:Douglas Clifton
PHP UTF-8 cheatsheet charsetconfigdeveloperi18ninstallmysqlphpprogrammingunicodeutf8
-
7.HTML Entity Character Lookup
A nice tool for finding HTML character entities either by name or code point, and a big improvement over manually scanning through hundreds of rows in the many table formats out there. Although it behaves like Ajax, even the data is stored as JavaScript. [1699]
★★★★☆
URI:http://leftlogic.com/lounge/articles/entity-lookup/
Author:Remy Sharp [1]
Reviewed:Sunday, June 17th, 2007 @ 9:40 PM EDT
by:Douglas Clifton
HTML Entity Character Lookup charsetencodinghtmljavascriptmarkup
-
8.
The Definitive Guide to Web Character EncodingAs usual, Tommy delivers a completely readable and thoroughly detailed article. This time, on character encoding. If this topic is new to you, this is a great overview. Covers various standards, browser support, sending the correct headers and more. [1697]
★★★★☆
URI:http://www.sitepoint.com/article/guide-web-character-encoding
Author:Tommy Olsson [3]
Reviewed:Saturday, January 20th, 2007 @ 4:41 PM EST
by:Douglas Clifton
The Definitive Guide to Web Character Encoding browsercharsetencodingstandardsunicode
-
9.XHTML Character Entity Reference
This page contains the 252 allowed entities in HTML 4 and XHTML 1.0. The entities are divided and color coded into logical categories which enables the user to filter the tabular view of the characters. Each entity includes name, decimal and Unicode hex. [1548]
★★★★☆
URI:http://digitalmediaminute.com/reference/entity/
Author:Jim Rutherford [1]
Reviewed:Monday, November 7th, 2005 @ 12:52 AM EST
by:Douglas Clifton
XHTML Character Entity Reference charsethtmlreferenceunicodexhtml
-
10.International Components for Unicode
The ICU is a mature and widely used set of C, C++ and Java libraries for Unicode support, software internationalization (i18n) and globalization (g11n). It was expanded from the JDK 1.1 internationalization APIs, which the ICU team contributed to. [1522]
★★★★☆
URI:http://icu.sourceforge.net/
Author:ICU Team [1]
Reviewed:Tuesday, August 30th, 2005 @ 10:14 PM EDT
by:Douglas Clifton
International Components for Unicode apicjavaunicode
-
11.Computers and Writing Systems
The place to visit for non-Roman authoring resources, including Unicode fonts, licensing information, and an incredible array of articles on translation, linguistics and publishing. From the NRSI, a research and development team within SIL International. [1441]
★★★★☆
URI:http://scripts.sil.org/cms/scripts/page.php?site_id=nrsi
Author:SIL International [1]
Reviewed:Thursday, July 28th, 2005 @ 11:38 PM EDT
by:Douglas Clifton
Computers and Writing Systems encodingfontsresourcestranslationunicode
-
12.Gallery of Unicode Fonts
Samples of available fonts in many different languages and writing systems include Cirth and Tengwar for you Tolkien fans out there. Confession: I'm one of them. Links to other font resources notably for Linux, FreeBSD and similar open-source OSs. [1213]
★★★★☆
URI:http://www.travelphrases.info/fonts.html
Author:David McCreedy [1]
Reviewed:Friday, May 20th, 2005 @ 9:20 PM EDT
by:Douglas Clifton
Gallery of Unicode Fonts fontsfreebsdgallerylinuxopen-sourceosunicode
-
13.Unicode and Character Sets for Software Developers
It's about time I added a listing to one of Joel's articles. This is a great tutorial on character sets and encoding for the uninitiated. After some background and history, he explains Unicode code points and their hexadecimal representation. [1212]
★★★★☆
URI:http://joelonsoftware.com/articles/Unicode.html
Author:Joel Spolsky [1]
Reviewed:Friday, May 20th, 2005 @ 9:54 AM EDT
by:Douglas Clifton
Unicode and Character Sets for Software Developers charsetdevelopersoftwareunicode
-
14.Character Sets and Encodings in XHTML, HTML and CSS
A detailed tutorial on international and special character encoding for Web documents. Presented as a series of slides with navigation. Includes three different views: all-in-one with small images, slide-by-slide with larger images, and text only. [1211]
★★★★☆
URI:http://www.w3.org/International/tutorials/tutorial-char-enc/
Author:I18n Core Working Group [2]
Reviewed:Friday, May 20th, 2005 @ 8:47 AM EDT
by:Douglas Clifton
Character Sets and Encodings in XHTML, HTML and CSS charsetcssencodinghtmlxhtml
-
15.URL Encode Chart
With PHP's urlencode() and urldecode() functions (and similar methods via CGI.pm and so on), we can usually forget about the details of encoding URL query string values. However, sometimes it's nice to have a reference chart handy. So here you go. [1154]
★★★★☆
URI:http://i-technica.com/whitestuff/urlencodechart.html
Author:Helen Triolo [1]
Reviewed:Sunday, May 15th, 2005 @ 3:10 AM EDT
by:Douglas Clifton
URL Encode Chart encodingperlphpreferenceurl
-
16.Unicode Resources
Tired of seeing those "?" characters on international Web sites? Alan's guides and resource lists include a plethora of tips for enabling Unicode support no matter what your operating system is. And if Unicode fonts are what you're after... [1139]
★★★★☆
URI:http://alanwood.net/unicode/
Author:Alan Wood [1]
Reviewed:Friday, May 13th, 2005 @ 12:24 AM EDT
by:Douglas Clifton
Unicode Resources fontsosunicode
-
17.Thoughts on Character Entities
If you're using XHTML, remember that it is in fact XML, which only supports five named character entities by default. And even declaring a DTD that includes the named varieties is no guarantee all browsers will support them. [1138]
★★★★☆
URI:http://norman.walsh.name/2003/11/13/charent
Author:Norman Walsh [3]
Reviewed:Thursday, May 12th, 2005 @ 11:56 PM EDT
by:Douglas Clifton
Thoughts on Character Entities charsetencodingxml
-
18.Letter Database: Languages, Character Sets and Names
Not the easiest form to get used to, but once you do there is a wealth of information stored in this database. Jukka ("Yucca") is linking to this database from his "Detailed description of the characters" resource. [1137]
★★★★☆
URI:http://www.eki.ee/letter/
Author:Eesti Keele Instituut [1]
Reviewed:Thursday, May 12th, 2005 @ 12:52 PM EDT
by:Douglas Clifton
Letter Database: Languages, Character Sets and Names charsetencodingunicode
-
19.Unicode Chart
Ian must have been thinking of me when he took on this project. Unbelievable, he has created a 6 x 12 foot wall poster of every Unicode character in the arsenal. I can't even imagine the amount of work that went into it. Fascinating stuff. [1135]
★★★★☆
URI:http://ianalbert.com/misc/unichart.php
Author:Ian Albert [1]
Reviewed:Thursday, May 12th, 2005 @ 12:06 PM EDT
by:Douglas Clifton
Unicode Chart encodingunicode
-
20.Character Entity References in HTML
Introduction and background to character entities and encoding in HTML 4.01. Includes ISO 8859-1 (Latin-1), symbols, math symbols, Greek letters and markup-significant and international characters. The lists are in DTD format (not exactly friendly). [1134]
★★★★☆
URI:http://www.w3.org/TR/1999/REC-html401-19991224/sgml/entities.html
Author:Dave Raggett [3]
Reviewed:Thursday, May 12th, 2005 @ 11:51 AM EDT
by:Douglas Clifton
Character Entity References in HTML charsetencodinghtmlw3c
-
21.A Simple Character Entity Chart
An introduction and background on the use of character entities in Web markup (typically HTML) and a nice set of charts that include the named entity, number value, character and description of each character including math symbols and greek letters. [1133]
★★★★☆
URI:http://evolt.org/article/A_Simple_Character_Entity_Chart/17/21234/
Author:Adrian Roselli [2]
Reviewed:Thursday, May 12th, 2005 @ 11:36 AM EDT
by:Douglas Clifton
A Simple Character Entity Chart encodinghtml
-
22.A Brief History of Character Codes
Brief, as in a nap? Phew! Very detailed history of character encoding systems including ASCII, EBCDIC, Unicode and TRON. Make a strong pot of coffee, and learn something... [1132]
★★★★☆
URI:http://tronweb.super-nova.co.jp/characcodehist.html
Author:Steven J. Searle [1]
Reviewed:Thursday, May 12th, 2005 @ 11:19 AM EDT
by:Douglas Clifton
A Brief History of Character Codes charsetencodinghistoryunicode
Related Categories
Matching Tags
apache api browser c charset config css developer encoding extension firefox fonts freebsd gallery history html http i18n install java javascript linux markup media types mysql open-source os perl php programming reference resources software standards translation tutorial unicode url utf8 w3c xhtml xml
Indexes
Search for Character Encoding on:
Last updated: Wednesday, December 24th, 2008 @ 11:56 PM EST [2008-12-25T04:56:00Z]















































































