Deprecated: iconv_set_encoding(): Use of iconv.internal_encoding is deprecated in /home/fusione1/public_html/portal/libraries/joomla/string/string.php on line 27

Deprecated: iconv_set_encoding(): Use of iconv.input_encoding is deprecated in /home/fusione1/public_html/portal/libraries/joomla/string/string.php on line 28

Deprecated: iconv_set_encoding(): Use of iconv.output_encoding is deprecated in /home/fusione1/public_html/portal/libraries/joomla/string/string.php on line 29

Warning: Cannot modify header information - headers already sent by (output started at /home/fusione1/public_html/portal/libraries/joomla/string/string.php:27) in /home/fusione1/public_html/portal/plugins/system/jat3/jat3/core/parameter.php on line 107

Deprecated: preg_replace(): The /e modifier is deprecated, use preg_replace_callback instead in /home/fusione1/public_html/portal/libraries/joomla/filter/input.php on line 660

Deprecated: preg_replace(): The /e modifier is deprecated, use preg_replace_callback instead in /home/fusione1/public_html/portal/libraries/joomla/filter/input.php on line 663
Web Design

Deutschland online bookmaker 100% Bonus.

Deprecated: preg_replace(): The /e modifier is deprecated, use preg_replace_callback instead in /home/fusione1/public_html/portal/libraries/joomla/filter/input.php on line 660

Deprecated: preg_replace(): The /e modifier is deprecated, use preg_replace_callback instead in /home/fusione1/public_html/portal/libraries/joomla/filter/input.php on line 663

Deprecated: preg_replace(): The /e modifier is deprecated, use preg_replace_callback instead in /home/fusione1/public_html/portal/libraries/joomla/filter/input.php on line 660

Deprecated: preg_replace(): The /e modifier is deprecated, use preg_replace_callback instead in /home/fusione1/public_html/portal/libraries/joomla/filter/input.php on line 663

Character encodings for beginners

So what's a character encoding?

Words and sentences in text are created from characters. Examples of characters include the Latin letter á or the Chinese ideograph ? or the Devanagari character ?.

                               You may not be able to see some of the characters in the page because you don't have the necessary fonts. If you click on the place where you expected to see a character you will link to a graphic version. This page is encoded in UTF-8.

Characters are grouped into a character set (also called a repertoire). This is then called a coded character set when each character is assigned a particular number, called a code point. These code points will be represented in the computer by one or more bytes.

The character encoding is the key that maps code points to bytes in the computer memory, and read the bytes back into codepoints.

Basically, you can visualise this by assuming that all characters are stored in computers using a code, like the ciphers used in espionage. A character encoding provides a key to unlock (ie. crack) the code. It is a set of mappings between the bytes representing numbers in the computer and characters in the coded character set. Without the key, the data looks like garbage.

The misleading term charset is often used to refer to what are in reality character encodings. You should be aware of this usage, but stick to using the term character encodings whenever you can.

Unfortunately, there are many different character sets and character encodings, ie. many different ways of mapping between bytes, code points and characters. The section Additional information provides a little more detail for those who are interested.

Most of the time, however, you will not need to know the details. You will just need to be sure that you consider the advice in the section How does this affect me? below.

How do fonts fit into this?

font is a collection of glyph definitions, ie. definitions of shapes used to display characters.

Once your application has worked out what characters it is dealing with, it will then look in the font for glyphs in order to display or print those characters. (Of course, if the encoding information was wrong, it will be looking up glyphs for the wrong characters.)

A given font will usually cover a single character set, or in the case of a large character set like Unicode, just a subset of all the characters in the set. If your font doesn't have a glyph for a particular character, some applications will look for the missing character in other fonts on your system (which will mean that the glyph will look different from the surrounding text, like a ransom note). Otherwise you will typically see a square box, a question mark or some other character instead. For example:


How does this affect me?

As a content author or developer, you should nowadays always choose the UTF-8 character encoding for your content or data. This Unicode encoding is a good choice because you can use a single encoding to handle pretty much any character you are likely to meet. This greatly simplifies things. Using Unicode throughout your system also removes the need to track and convert between various character encodings.

Content authors need to find out how to declare the character encoding used for the document format they are working with.

Note that just declaring a different encoding in your page won't change the bytes; you need to save the text in that encoding too. Content authors need to check what encoding their editor or scripts are saving text in, and how to save text in UTF-8. You may also need to check that your server is serving documents with the right HTTP declarations.

Developers need to ensure that the various parts of the system can communicate with each other, understand which character encodings are being used, and support all the necessary encodings and characters. (Ideally, you would use UTF-8 throughout, and be spared this trouble.)

The links below provide some further reading on these topics.

Additional information

This section provides a little additional information on mapping between bytes, code points and characters for those who are interested. Feel free to just skip to the section Further reading.

                               Note that code point numbers are commonly expressed in hexadecimal notation - ie. base 16. For example, 233 in hexadecimal form is E9. Unicode code point values are typically written in the form U+00E9.

In the coded character set called ISO 8859-1 (also known as Latin1) the decimal code point value for the letter é is 233. However, in ISO 8859-5, the same code point represents the Cyrillic character ?.

These character sets contain fewer than 256 characters and map code points to byte values directly, so a code point with the value 233 is represented by a single byte with a value of 233. Note that it is only the context that determines whether that byte represents either é or ?.

There are other ways of handling characters from a range of scripts. For example, with the Unicode character set, you can represent both characters in the same set. In fact, Unicode contains, in a single set, probably all the characters you are likely to ever need. While the letter é is still represented by the code point value 233, the Cyrillic character ? now has a code point value of 1097.

Bytes these days are usually made up of 8 bits. There are only 28 (ie. 256) unique ways of combining 8 bits.

On the other hand, 1097 is too large a number to be represented by a single byte*. So, if you use the character encoding for Unicode text called UTF-8, ? will be represented by two bytes. However, the code point value is not simply derived from the value of the two bytes spliced together – some more complicated decoding is needed.

Other Unicode characters map to one, three or four bytes in the UTF-8 encoding.

Furthermore, note that the letter é is also represented by two bytes in UTF-8, not the single byte used in ISO 8859-1. (Only ASCII characters are encoded with a single byte in UTF-8.)

UTF-8 is the most widely used way to represent Unicode text in web pages, and you should always use UTF-8 when creating your web pages and databases. But, in principle, UTF-8 is only one of the possible ways of encoding Unicode characters. In other words, a single code point in the Unicode character set can actually be mapped to different byte sequences, depending on which encoding was used for the document. Unicode code points could be mapped to bytes using any one of the encodings called UTF-8, UTF-16 or UTF-32. The Devanagari character ?, with code point 2325 (which is 915 in hexadecimal notation), will be represented by two bytes when using the UTF-16 encoding (09 15), three bytes with UTF-8 (E0 A4 95), or four bytes with UTF-32 (00 00 09 15).

There can be further complications beyond those described in this section (such as byte order and escape sequences), but the detail described here shows why it is important that the application you are working with knows which character encoding is appropriate for your data, and knows how to handle that encoding.


The article Character encodings: Essential concepts provides some gentle introductions to related topics, such as Unicode, UTF-8, Character sets, coded character sets, and encodings, the document character set, character escapes and the HTTP header.

By: Richard Ishida, W3C.



When people say 'HTML5', they usually mean a bit more than just the 5th version of the "HyperText Markup Language". Modern Web pages and Web applications are generally composed of at least three components, so what people often mean when they say 'HTML5' is the trio of languages: HTML5, CSS3 and JavaScript.

The 'HTML' part contains all the content, organized into a logical structure.  This is the part that an author might be most concerned with: the words, chapter headings, figures, diagrams, etc. 

While there have been numerous versions of HTML since its inception, our focus in this course is the most recent version, HTML5.  HTML5 was developed to provide more powerful and flexible ways for developers to create dynamic Web pages.


laptop showing css enhanced web design

The 'CSS' part (version 3 being current) is all about the presentation or style of the page; what it looks like without too much regard for the specific content. We'll be going into more detail on that later in this course, but for now, think of it as the way you might specify a "theme" in a word processing document, setting fonts, sizes, indentations and whatever else may apply to what it looks like.



Javascript logo

The 'JavaScript' part is about the actions a page can take such as interaction with the user, and customizing and changing the page according to any number of parameters.  This is what allows a Web page to be more than just a document, but potentially a Web application, with nearly unlimited possibilities.  We will not be doing much with JavaScript in this course, but you should know that it is an important leg of the stool for modern Web pages.



Source: EDX


  • Raw Coding
  • Content Management System (CMS)

    Web Application designed with open source content.... usually php

  • Management System

    Media ImageDatabase System gives your site its look and feel. They determine layout, colours, typefaces, graphics and other aspects of design that make your site unique. Your installation of comes prepackaged with front end templates and backend templates.

  • Typography

    Web design typography 

Download Template Joomla 3.0 free theme.

You are here: Home Knowlegdebase Service Knowlegde Web Design