logo

"We need a web site."

Now what?
When you have plenty of ideas, but not a lot of answers, the next step is WeavingtheWeb.com.

Enter your email address here to subscribe to the "PHP Tip of the Week," emailed to you every Monday morning. Enjoy!

 

PHP Tip of the Week Archive

  • If Brevity: The Ternary Operator
  • Easy Database Inserts
  • Output Buffering and Editing Web Pages On the Fly
  • Searching Arrays
  • Random Thoughts
  • Faster Page Transfer Using Compression
  • Persistent Values in Web Forms
  • Using HEREDOC Syntax with HTML
  • Calculating Dates More Easily
  • Using Single and Double Quotes
  • Timing Your PHP Scripts
  • When Nothing is Something: Zero, Empty, Null, and Negative Values
  • Nesting Instincts
  • Useful (for) Characters: The Increment Operator
  • The "static" keyword
  • Those !#@% Characters!

    It’s frustrating when text retrieved from a database and displayed on your web page contains funky characters that are obviously Microsoft™ Word characters that have been mistranslated, like:

    Here is an example of ?double quotes.?

    ...when you really expected:

    Here is an example of “double quotes.”

    Here are four important elements that control the translation of characters from database to web page, and what php contributes to the mix :

    1. The character set used to store the text in the database.

    2. The default character set used by the php parser.

    3. The character set specified to the browser through a “Content-Type” header.

    4. The character set in a meta tag in the web page itself.

    The bottom line: specifying a character set of “iso-8859-1” or “windows-1252” allows browsers to correctly translate special Windows characters that are stored in a database correctly. Using the “utf-8” character set will not display those characters correctly. Look below for more info about the use of iso-8859-1 and windows-1252 and their interchangability in browsers [1] and here's a very good summary of this whole sticky issue [2].

    Let’s briefly consider how each element I’ve noted affects display of text to the browser:

    1. The character set used to store the text in the database.

    Many times the default character set used by the database server and/or a specific table will be the “latin” (or “latin1”) character set, which will properly store those “odd” Word characters.

    To determine mysql’s default character set:

    <?php

    //Set a database connection handle before this, named $dbh

    $q = "SHOW VARIABLES LIKE 'character_set_database'";
    //for < mysql 5.0, try "character_set" instead of "character_set_database"

    $r = mysql_query($q,$dbh);
    print "<pre>\n\n";
    while ($row = mysql_fetch_assoc($r)) {
      print_r ($row);
    }
    print "</pre>";

    ?>

    For the not-so-faint-of-heart, the dev.mysql.com site has a good article on mysql’s use of utf-8 and its effect on web page encoding. [3]

    2. The character set used by the php parser.

    This can be retrieved by using:

    echo ini_get("default_charset");

    That character set can be set by using:

    ini_set("default_charset","iso-8859-1");

    The “iso-8859-1” character set is a subset of the broader “Latin” character set, and it is sometimes labeled “Latin-1”.

    Setting the php character set to match the character set of your database is critical, for this is where character mistranslations often occur (especially if php's -- or the server's -- default is the utf-8 character set).

    3. The character set specified to the web page through a “Content-Type” header.

    It is important that you inform the browser receiving the retrieved text exactly what character set to use to display that text. One method is to use php’s header() function:

    <?php
    //Be clear about the charset used:
    header("Content-Type: text/html; charset=iso-8859-1");
    ?>

    This too can be an effective way to make sure that select Word characters are translated correctly to the browser.

    4. The character set in a meta tag in the web page itself.

    A meta tag in the <head> of your web page, like:

    <meta http-equiv="content-type" content="text/html;charset=iso-8859-1">

    …will likely be of limited value to assure that all characters in your page’s text are translated correctly. This is because the “odd” characters in question will already have been mistranslated before they come under the influence of the <meta> tag in the html page itself.

    Character sets and character encoding are very broad subjects, and this brief article is meant only to help basic troubleshooting for character mistranslation. (For instance, the use of an .htaccess file on the server can also help with character set control.) A more comprehensive tutorial is noted below. [4]


    [1] http://tinyurl.com/yqsyr9
    [2] http://www.joelonsoftware.com/articles/Unicode.html
    [3] http://tinyurl.com/8nz79
    [4] A Character Set and Encodings Tutorial, http://www.w3.org/International/tutorials/tutorial-char-enc/

  • The Value of PHP Classes and OOP
  • The Simple Things in (programming) Life
  • Just a Passing Reference
  • Simple Image Creation with PHP