tr | en
12
Aug
2011
XSLT Template for Turkish Hyphenation
Written by:  Software Development Unit   |   Category: Techno Lib
Although they are mainly used for data exchange between different systems the main power of XML and XSLT are their ability to provide a platform suitable for seperating program coding from presentation. In this way, while the programmer concentrates on his/her own job which is purely programming related such as algorithm coding, data storage and data management; the graphics designer will be able to do his/her job independently.

In this article we will present the implementation of a hyphenation algorithm suitable for Turkish language with XSLT. This algorithm was initially developed by us as a QuerkXPress page layout aplication extension for SABAH national daily newspaper.

While it is the job of the programmer to query a text from the database and make it ready for consumption the decisions related to the presentation of  that text should me made and implemented by the graphics designer independently.

We will see, in this article, how a text written in Turkish cen be presented after hyphenating via pure XSLT in such a way that no low level programming is needed.

Following assumptions are made to make the sample XSLT templates simple:

  • The text to be hyphenated does not contain any punctuation characters (this can be easily achieved by the translate() function of XSLT)
  • The words of the text to be hyphenated are seperated by single space (' ') character (this may easily be achieved by the normalize-space() function of XSLT).

The input text will ve seperated into words before hyphenating for presentation. For this purpose we make use of the metin-hecele XSLT template shown below and pass the text to be hyphenated to the template using the $metin parameter.

<xsl:template name="metin-hecele">
<xsl:param name="metin"/>
<xsl:choose>
  <xsl:when test="contains($metin,' ')">
   <xsl:call-template name="he-ce-le">
    <xsl:with-param name="sozcuk" select="substring-before($metin,' ')"/>
   </xsl:call-template>
   <xsl:text> </xsl:text>
   <xsl:call-template name="metin-hecele">
    <xsl:with-param name="metin" select="substring-after($metin,' ')"/>
   </xsl:call-template>
   </xsl:when>
   <xsl:otherwise>
    <xsl:call-template name="he-ce-le">
     <xsl:with-param name="sozcuk" select="$metin"/>
    </xsl:call-template>
  </xsl:otherwise>
  </xsl:choose>
</xsl:template>

The above template is used to break the whole text into individual words and hyphenate each word in sequence by making use of the template named  he-ce-le.

The Turkish hyphenation algorithm that we will implement basicly makes use of the number of consonants between two vowels as well as the position of the character r within this consonant sequence in order to calculate the position of soft hyphens within the word.

In order to be able to find the positions of wovels and the r characters easily with a single operation we make use of a special variable named $ozel where we temporarly save a special version of the word to be hyphenated where all vowels are replaced with lower-case 'a' character and all uppercase 'R's are replaced with its lover-case equivalent ('r').

As the soft hyphen character we use its unicode equivalent '&#173;'.

Lastly, within the $say variable we store teh number of consonants between the two vowels to be processed.

The algorithm is simply described by the following four items:

  1. If the number of consonants between the two vowels is zero (0) or one (1) then put a soft hyphen just after the first vowel.
  2. If the number of consonants between the two vowels is three (3) and the third consonant is the character 'r' then put a soft hyphen just after the first consonant
  3. Otherwise put a consonant in just in between.
  4. Continue hyphenation just from the position of the second vowel.

The implementation of the above algorithm is shown below as an XSLT template named he-ce-le:

<xsl:template name="he-ce-le">
 <xsl:param name="sozcuk"/>
 <xsl:param name="tire" select="'&#173;'"/>

 <xsl:variable name="ozel" select="translate($sozcuk, 'aeıioöuüAEIİOÖUÜR','aaaaaaaaaaaaaaaar')"/>

 <xsl:variable name="a" select="substring($sozcuk, 1 ,string-length(substring-before($ozel,'a'))+1)"/>
 <xsl:variable name="bo" select="substring($ozel,string-length($a)+1)"/>
 <xsl:variable name="b" select="substring($sozcuk,string-length($a)+1)"/>
 <xsl:variable name="c" select="substring-before($bo,'a')"/>
 <xsl:variable name="say" select="string-length($c)"/>
 <xsl:choose>
  <xsl:when test="not(contains($bo,'a'))">
   <xsl:value-of select="$sozcuk"/>
  </xsl:when>
  <xsl:when test="$say=0 or $say=1">
   <xsl:value-of select="concat($a,$tire)" disable-output-escaping="yes"/>
   <xsl:call-template name="sonrasi">
    <xsl:with-param name="sozcuk" select="$b"/>
   </xsl:call-template>
  </xsl:when>
  <xsl:when test="$say=3 and substring($bo,3,1)='r'">
   <xsl:value-of select="concat($a,substring($c,1,1),$tire)" disable-output-escaping="yes"/>
   <xsl:call-template name="sonrasi">
    <xsl:with-param name="sozcuk" select="substring($b,2)"/>
   </xsl:call-template>
  </xsl:when>
  <xsl:otherwise>
   <xsl:variable name="yer" select="ceiling( $say div 2)"/>
   <xsl:value-of select="concat($a,substring($c,1,$yer),$tire)" disable-output-escaping="yes"/>
   <xsl:call-template name="sonrasi">
    <xsl:with-param name="sozcuk" select="substring($b,$yer + 1)"/>
   </xsl:call-template>
  </xsl:otherwise>
 </xsl:choose>
</xsl:template>

<xsl:template name="sonrasi">
 <xsl:param name="sozcuk"/>
 <xsl:variable name="ozel" select="translate($sozcuk, 'aeıioöuüAEIİOÖUÜ','aaaaaaaaaaaaaaaa')"/>
 <xsl:choose>
  <xsl:when test="contains($ozel,'a')">
   <xsl:call-template name="he-ce-le">
    <xsl:with-param name="sozcuk" select="$sozcuk"/>
   </xsl:call-template>
  </xsl:when>
  <xsl:otherwise>
   <xsl:value-of select="$sozcuk"/>
  </xsl:otherwise>
 </xsl:choose>
</xsl:template>
 


Add Comment (please login)