In this article we will present the implementation of a hyphenation algorithm suitable for Turkish language with XSLT. This algorithm was initially developed by us as a QuerkXPress page layout aplication extension for SABAH national daily newspaper.
While it is the job of the programmer to query a text from the database and make it ready for consumption the decisions related to the presentation of that text should me made and implemented by the graphics designer independently.
We will see, in this article, how a text written in Turkish cen be presented after hyphenating via pure XSLT in such a way that no low level programming is needed.
Following assumptions are made to make the sample XSLT templates simple:
The input text will ve seperated into words before hyphenating for presentation. For this purpose we make use of the metin-hecele XSLT template shown below and pass the text to be hyphenated to the template using the $metin parameter.
<xsl:template name="metin-hecele">
<xsl:param name="metin"/>
<xsl:choose>
<xsl:when test="contains($metin,' ')">
<xsl:call-template name="he-ce-le">
<xsl:with-param name="sozcuk" select="substring-before($metin,' ')"/>
</xsl:call-template>
<xsl:text> </xsl:text>
<xsl:call-template name="metin-hecele">
<xsl:with-param name="metin" select="substring-after($metin,' ')"/>
</xsl:call-template>
</xsl:when>
<xsl:otherwise>
<xsl:call-template name="he-ce-le">
<xsl:with-param name="sozcuk" select="$metin"/>
</xsl:call-template>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
The above template is used to break the whole text into individual words and hyphenate each word in sequence by making use of the template named he-ce-le.
The Turkish hyphenation algorithm that we will implement basicly makes use of the number of consonants between two vowels as well as the position of the character r within this consonant sequence in order to calculate the position of soft hyphens within the word.
In order to be able to find the positions of wovels and the r characters easily with a single operation we make use of a special variable named $ozel where we temporarly save a special version of the word to be hyphenated where all vowels are replaced with lower-case 'a' character and all uppercase 'R's are replaced with its lover-case equivalent ('r').
As the soft hyphen character we use its unicode equivalent '­'.
Lastly, within the $say variable we store teh number of consonants between the two vowels to be processed.
The algorithm is simply described by the following four items:
The implementation of the above algorithm is shown below as an XSLT template named he-ce-le:
<xsl:template name="he-ce-le">
<xsl:param name="sozcuk"/>
<xsl:param name="tire" select="'­'"/>
<xsl:variable name="ozel" select="translate($sozcuk, 'aeıioöuüAEIİOÖUÜR','aaaaaaaaaaaaaaaar')"/>
<xsl:variable name="a" select="substring($sozcuk, 1 ,string-length(substring-before($ozel,'a'))+1)"/>
<xsl:variable name="bo" select="substring($ozel,string-length($a)+1)"/>
<xsl:variable name="b" select="substring($sozcuk,string-length($a)+1)"/>
<xsl:variable name="c" select="substring-before($bo,'a')"/>
<xsl:variable name="say" select="string-length($c)"/>
<xsl:choose>
<xsl:when test="not(contains($bo,'a'))">
<xsl:value-of select="$sozcuk"/>
</xsl:when>
<xsl:when test="$say=0 or $say=1">
<xsl:value-of select="concat($a,$tire)" disable-output-escaping="yes"/>
<xsl:call-template name="sonrasi">
<xsl:with-param name="sozcuk" select="$b"/>
</xsl:call-template>
</xsl:when>
<xsl:when test="$say=3 and substring($bo,3,1)='r'">
<xsl:value-of select="concat($a,substring($c,1,1),$tire)" disable-output-escaping="yes"/>
<xsl:call-template name="sonrasi">
<xsl:with-param name="sozcuk" select="substring($b,2)"/>
</xsl:call-template>
</xsl:when>
<xsl:otherwise>
<xsl:variable name="yer" select="ceiling( $say div 2)"/>
<xsl:value-of select="concat($a,substring($c,1,$yer),$tire)" disable-output-escaping="yes"/>
<xsl:call-template name="sonrasi">
<xsl:with-param name="sozcuk" select="substring($b,$yer + 1)"/>
</xsl:call-template>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
<xsl:template name="sonrasi">
<xsl:param name="sozcuk"/>
<xsl:variable name="ozel" select="translate($sozcuk, 'aeıioöuüAEIİOÖUÜ','aaaaaaaaaaaaaaaa')"/>
<xsl:choose>
<xsl:when test="contains($ozel,'a')">
<xsl:call-template name="he-ce-le">
<xsl:with-param name="sozcuk" select="$sozcuk"/>
</xsl:call-template>
</xsl:when>
<xsl:otherwise>
<xsl:value-of select="$sozcuk"/>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
Add Comment (please login) |