DESIGN AND IMPLEMENTATION OF A MATHEMATICAL
ENCRYPTION MODEL FOR THE CENTRAL KURDISH FONT BASED ON UNICODE
Ziyad H. Abduljabbar a*,
Zeravan A. Ali a , Hanan A.Taher a
a Technical College of
Administration, Duhok Polytechnic University, Duhok, Kurdistan Region, Iraq - (ziyad.hazim, zeravan.ali, hanan.taher)@dpu.edu.krd
Received: 3 Mar., 2023 / Accepted: 22 May.,
2023 / Published: 4 June, 2023 https://doi.org/10.25271/sjuoz.2023.11.2.1126
This
research focuses on the development of encryption algorithms for the Kurdish
language, specifically tailored to the Kurdish alphabet. With the rapid growth
and digital advancements in the Kurdistan Region of Iraq, there is a pressing
need for accurate encryption methods that can be applied to Kurdish texts in
administration and digital governance. To address this need, a mathematical
encryption model is proposed, leveraging the Kurdish central font supported by
Microsoft Windows to ensure compatibility between sender and receiver. The
model utilizes the Unicode representation of Kurdish letters to calculate
offset and mod values accurately. The effectiveness of the proposed model is
validated through its implementation using the Caesar cipher method.
Computation tasks are performed using Excel, while the encryption system is
designed and programmed in C#. Extensive testing of the system with diverse key
values demonstrates its high accuracy, achieving a high success rate in
encrypting Kurdish texts. This research contributes significantly to the field
of encryption for the Kurdish language, providing a scientific framework for
further advancements in this area.
KEYWORDS: Kurdish
language, Unicode, Central Kurdish Font, Encryption, Decryption, Offset Value, Caesar
Method.
Encryption
is one of
the most important and powerful controls for the security of a computer system,
through which text is encrypted to make
it unclear and difficult for intruders to read.
With
the great development of web technologies at the
beginning of the twenty-first century, especially in the
field of e-government, computer crimes began to increase
and it became necessary to
continuously develop encryption algorithms for controlling and defending
against breaches and to
ensure complete confidentiality and security in storing and exchanging
information (Pfleeger, Pfleeger, & Margulies, 2015; Stallings,
2006).
In the
midst of this global development in this field, the Kurdistan Region - Iraq experienced a great
development in the use of electronic government, which now has a major role in
communicating with citizens. This will increase the level of electronic
transactions over the net, which should be sufficiently protected (Shareef &
Arreymbi, 2013). In addition, the field of information
technology has greatly developed in the universities of the
Kurdistan Region - Iraq in recent years, and the information security course is
now essential in many colleges and institutes.
It is, however, preferred to use the Kurdish language along with the standard English
language in the practical aspect of information security experiences.
The
search included, a study reviewing of the relevant literature, and also an
overview about the encryption
algorithm for English letter with their ASCII, offset and index values. Then
the Introduction to the Standard Coding System Development was presented, and
then the research touched on the central Kurdish font
as a standard Kurdish font, since the research relied on it to ensure the
accuracy of the encryption. Then
the research included a full explanation about the steps required to design the
mathematical encryption model and how to calculate the offset and mod values
for Kurdish letters, with number of tables that show a detail of the mathematical
steps and results as well as the figures for the steps required.
The mod function ensures that the letters are
wrapped when they are encrypted, and its remain within
the same frame limits for the specific language.
The main
contributions of this research are as follows:
Development of a mathematical encryption model
for the Kurdish language, enabling accurate encryption and decryption
processes.
Utilization of the Kurdish central font,
supported by the Microsoft Windows operating system, to ensure compatibility and
accuracy in encryption.
Calculation of offset and mod values specific
to the Kurdish alphabet, ensuring integrity and security in encryption.
Implementation of the proposed model using the
Caesar cipher method, providing a practical application of the encryption
system.
Figure 1. General
view for system
There
are stingy studies
to precisely focusing encryption systems especially those who related directly to
encryption Kurdish texts. (Kako,
2018) proposed
a related work covering digital security and its role in protecting information
privacy were evaluated. The
researcher used several encryption algorithms have been evaluated and the
significant role of Unicode in communication
and its security has been demonstrated. (Alkhudaydi
& Gutub, 2021) introduce
and suggest an effective system for hiding the Arabic text based on two
algorithms, namely: light-weight cryptography (LWC) and Arabic text
steganography.(Shareef
& Arreymbi, 2013) implied
two modifications to the Playfair cipher algorithm, the first by using the
Unicode and the second without using the Unicode, adopting on the Romanization
system, that ensures the removal of the natural characteristics of the Arabic
language using the Knight Tour key. (Tawfiq,
2018) imported
an enhanced LSB substitution algorithm for masking Kurdish text content written
in a text file into digital picture, while (Maram,
Gnanasekar, Manogaran, Balaanand, & Applications, 2019)
concentrated on emphasizing the importance of Unicode as
playing an important role in digital communication, as it covers about 120 languages
in the world and relying on it in developing UNICODE data privacy and security
encryption algorithms (UDPS) to ensure data security in digital communication.
Although (AL-Shakarchy,
AL-Shahad, & AL-Nasrawi, 2018) offered
an encryption method that provides sufficient confidentiality depending on the
Unicode and crossover, by mapping table for English alphabet used in plaintext
and mapping table for Arabic alphabet used in key generation. At the same time (Kako,
2018) tried
to develop an algorithm to encrypt and decrypt Kurdish letters using the
decimal value of the letters to ensure the security of Kurdish communications.
Whereas (Khairullah
& Ratul, 2018) develop
an algorithm to encrypt and decrypt Kurdish letters using the decimal value of
the letters to ensure the security of Kurdish communications. However, (Ahmed,
Ahmed, Ahmed, & Science, 2015) planned
to hide the information written in Bengali language and stored in digital
documents by adopting the Unicode, but (AL-Nasrawi,
Hashem, & Odhaib, 2014) focused
on development of Playfair encryption algorithm to support the Kurdish text by
using an array of size in order to increase security during the messaging
process through an unreliable network in privacy and authentication. (Rashid,
2020) Used
RSA encryption algorithm to develops an encryption system for Kurdish and
English text as well. Furthermore, (Shirali-Shahreza
& Shirali-Shahreza, 2008)
utilized an approach Hide the Arabic and Persian text depending on the Unicode
to ensure confidential communication and prevent illegal copying and
distribution of the text.
The main contribution of this work, to the
problem of encryption Kurdish text is by suggesting a mathematical encryption
model in order to enable substitution encryption algorithms to become
applicable with the Kurdish alphabet, and after that the influence of our
approach was definitely accurate as a sequence recognized the existing
limitation of encryption regarding Kurdish alphabet.
Encryption system is the system concerned with
encryption and decrypting text. The general encryption system can be denoted by
the following general equations:
C=E(P); … (1) Encryption Algorithm.
P=D(C); … (2) Decryption Algorithm.
Where:
P = [P1, P2, …., Pn]; (3) String of plain text.
C = [C1, C2, …., Cn];
… (4)
String of cipher text.
Most of substitution
encryption methods, like Caesar,
Vigenere, Affine, and Hill algorithm, rely on modular
mathematical operations, (mod n), where (n) is the number of letters in
specific language, that means the calculations are done in a circular motion, i.e. if the result is greater than (n), the result will
reduce and warps turn around. For
example, in English language, which consist of (26) letters, here the values of
(n) are equal to (26), so that Z+2=B. Thus, all results of mod operation in
English language will be between (0-25). Before implementing
the(mod) function and starting with any of the above-mentioned encryption
methods, it is important to obtain pure index values for English letters and
make them into a sequence (0-25), and this is done by subtracting the ASCII code
value for the first letter, that referred to as the offset value, from the ASCII
code value for all letters. In English, the ASCII codes (65=’A’)
and (97=’a’) are represent the offset values for uppercase and lowercase
letters respectively, table 1.
As it shown in table 1
below, where the following two equations are applied for each of the
uppercase and lowercase letters:
1- Index (any capital letter) = ASCII code (that capital letter)-65.
2- Index (any small letter) = ASCII code (that
small letter)- 97.
Through this
research, a mathematical encryption model will be proposed to calculate the offset
and mod values for the Kurdish language based on central Kurdish font, in order
to rely on them in applying encryption methods on the Kurdish text (Ghauri,
2021; Hawezi, Azeez, & Qadir, 2019; Kareem, 2016).
Table 1.
Standard English letters with their ASCII code and index
Due to the importance of coding system as it is
the basis in the coding process, at the beginning in this research was to
conduct an exploratory survey study on coding system, its types and stages of
development, and what is the type currently adopted in to encode the Kurdish
letters. ASCII code was developed in 1960, which was the basis
for the representation of symbols in the computer memory, as it can be
represent 128 symbols (0..127) depending on (7 bits), and since the ASCII
system is able to represent only English letters, and in order to double the
number of character that can be encoded, so the ASCII system was developed by
IBM Corporation in 1981 to become an extended ASCII code by adding one bit to
become (8 bits), so the number of symbols that can be represented become (255)
characters. But this was not enough to represent the many other languages of
the world, like Japanese, Arabic, Kurdish, etc., so Unicode system was
developed in 1990, which is compatible with the (ASCII) system, and consists of
(23 bits), where it became possible to represent (2147483647) characters, Thus,
now, it became possible to represent the letters of all the languages of
the world, including the Kurdish language, which needs 24 bits to
represent its letters. But with all this benefit from the Unicode system,
another problem arose, which is represented by the large reservation of memory,
i.e. for example, the letter (A) which was represented by (1 byte) in the ASCII
code system, now needs (4 bytes) with the Unicode system and this is what
caused a great waste in memory, and in order to solve this problem, UTF-8 was
invented by Ken Thompson on September 2, 1992, which is an improved version of
the Unicode by which guarantees allocate memory (reservation) exactly to
a matching class boundary for the
language to which the character belongs, and according to that, UTF-8 will assign the exact number of bits
for any character (Aleqabie,
Al-Nasrawi, Al-Shakarchy, Alshahad, & Abd, 2019; Korpela, 2006; Miltner
& Society, 2021; Pyeatt, 2016).
In fact, there are many
types of fonts
currently using in the computer system
to write the Kurdish letters , and it is better to choose the appropriate font
type that provides compatibility between the sender and the receiver, as well
as to provide a standard encryption environment, and accordingly,
the research preferred to use the
central Kurdish font that provided by Microsoft Windows , so the first step of
practical part will be represent with install this keyboard, so after
completing this process, the central Kurdish font will be added and
appear among the language options available on the taskbar (AL-Nasrawi
et al.,
2014;
Korpela, 2006; Ramanathan, 2022), Figure
2.
Figure
2. Adding Central Kurdish font
The alphabet of the Kurdish language consists of 33
letters, table 2.
The Unicode value of Kurdish letter in Central
Kurdish font are not contiguous as is the case in the English
language, but interspersed with
some letters and movement symbols of the Arabic language, for
this reason, the mathematical analysis in this research was based on the actual
position
of the Kurdish letter on keyboard, as
shown in “Fig. 3” and
Table 3. As it is known, the Kurdish
letters, like letter of other languages, are centred
on the second, third and fourth line of the keyboard. Accordingly, “Fig. 4” shows the steps required
for the proposed mathematical encryption model for Kurdish letters, through
this model, the limits of the Unicode values were determined, where the Kurdish
language letters are located within this segment that was determined by the two
output values calculated by this model, these were represented by:
1. The offset value = 1569: which represents the minimum
Unicode value for the central Kurdish font, therefore, the Index (any Kurdish
letter) = Unicode (that letter) - 1569.
2. The Mod value = 181: which will be relied upon in the
mathematical equations of the encryption algorithms to ensure obtaining the
value of Unicode within the limits of the Kurdish language, i.e.
the limits where the Kurdish letters are ranging with (0-180).
Table 4 shows
the segment of the Unicode values (0 - 180), which
included the letters of the Kurdish language were determined through the
proposed model. “Fig. 5” represents the pseudo
code for building the mathematical encryption model of the Kurdish language.
Figure 3. Keyboard letters for the Central Kurdish font.
Table 3. Calculate the offset and mod values
Figure 4. Steps
required for the proposed mathematical encryption model for the Kurdish letters
Figure 5.
Pseudo code for calculating the offset and mode values
After completing the design of the proposed
mathematical encryption model, now comes the time to implement and test it. The
accuracy of encryption and decryption was verified by applying this model using
the Caesar cipher method. Where the values of offset and mod that were produced
by the proposed mathematical model were used (Maghrebi,
Portigliatti, & Prouff, 2016). The general equations for encryption and
decryption of any Kurdish letter based on the proposed mathematical encryption
model as shown in the following equations:
CKL=E(PKL) = ((Unicode (PKL)-1569)
+key) mod 181; …….(1) Encryption Algorithm
PKL=D(CKL)= ((Unicode (CKL)-1569) - key) mod
181; ……(2)
Decryption Algorithm
Where:
PKL= Plain Kurdish Letter.
Key: Any numeric
value.
The value (1569), represents the minimum
Unicode value for all the characters in the Central Kurdish keyboard.
The value (181), Represents the actual number
of all characters in the segment that are located together with the Kurdish
letters in the Central Kurdish keyboard.
The steps required for encryption and
decryption can be illustrated in the two flowcharts shown in Figure 6 and Figure
7 respectively.
Table 4. The
boundaries of the Unicode values (0 - 180), which included the letters of the
Kurdish language were determined through the proposed model.
Figure 6.
Steps for the Encryption Process
Figure 7. Steps
for the Decryption Process
Table 5. Required calculation for proposed mathematical encryption model
Table 6. Required calculation for proposed mathematical
encryption mode
Figure 8.
Encrypt and Decrypt Kurdish text with primary key value =3
Figure 9. Encrypt and Decrypt Kurdish text with primary
key value= 75
Figure 10.
Encrypt and Decrypt Kurdish text with primary key value=120
Figure 11.
Encrypt and Decrypt Kurdish text with primary key value=523
The Excel program was used to perform the
required calculations for the mathematical encryption model to calculate the
offset and the mod values. The above two tables 5 and 6,
represented the encryption and decryption steps respectively. Both of these
tables had the same number of columns, due to the similarity of the arithmetic
operations in both steps. The only difference is in the fifth column, where the
encryption process represents the adding of the key value, whereas the
decryption process is representing the subtracting of the key value (Rajasekharaiah,
Dule, & Sudarshan, 2020; Thakur, Qiu, Gai, & Ali, 2015). These
two tables were divided into two main parts:
Part one:
This part includes the columns from (col#1) to
(col#4), whose role is to receive the letter from the
Kurdish text and convert it into an index number ranging from (0-180), that
can be illustrated as
follows:
1- Column (col#1): the letter sequence
within the Kurdish text.
2- Column (col#2): the relevant letter
within the Kurdish text.
3 - Column (col#3): conversion of the Kurdish
character into its corresponding Unicode value.
4 - Column(col#4): subtracts the offset value
(1569) from the Unicode value into values ranging between (0-180).
Part two:
This part includes the columns from (col#5) to
(col#7). These columns complete the encryption (or decryption) process and
return the Unicode to the corresponding letter. This can be illustrated as
follows:
1- Column (col#5): adding (or subtracting) the
value of the key, and then apply the mod operation by (181) to ensure that the
Unicode values wrap between (0-180) after the calculation.
2 - Column (col#6): adding the subtracted
offset value (1569) to get the actual Unicode value.
3 - Column (col#7): converts the Unicode value
to its corresponding letter.
C#
programming language was used to design and code the
encryption system. The interface
of the encryption system, where the Kurdish sentence was (ئەز
زمانێ کوردی
نزانم ئەز خو
فێری زمانێ
کوردی دکەم), and the
key values were (3,75,120 and 523) is shown in the figures: “Fig. 8”, “Fig. 9”, “Fig. 10” and “Fig. 11” respectively.
From the above results tables: 5 and 6, and the
figures: “Fig. 8”, “Fig. 9”, “Fig. 10” and “Fig. 11”, it can be concluded that by using the Unicode value for the
actual location of the Kurdish letter on the keyboard, the mathematical
encryption operations will not require the Kurdish letters to be sequential.
2- The number of characters, that was equal to
(181) does not reflect the actual number of Kurdish letters, as they are
actually mixed with Arabic letters and a number of other characters.
3- With the use of different values of the
primary key (large or small), the encryption process had the same high
efficiency.
The encryption system for the Kurdish language
was designed and programmed based on the mathematical
encryption model that is proposed by this research. The
central Kurdish font was used
to ensure compatibility between the sender and receiver and to obtain high
encryption accuracy. The
Unicode for Kurdish letters was relied on during the calculations
for the mathematical encryption model, and it was used to calculate the offset
and mod values. To verify the accuracy of this model, it was implemented
using the Caesar cipher method. The encryption system was tested by encrypting
several Kurdish texts using different key values. The results showed a striking
high
accuracy.
The research recommends the necessary
development of the mathematical model in order to encode the Kurdish texts
mixed with the English texts. In addition to that, adopting the results of the
proposed mathematical model and applying them to other encryption algorithms.
Full thanks expressed to Duhok Polytechnic
University (DPU). We would like to express our gratitude to head of ITM
department, Dr.Amira Bibo Sallow for her guidance and support.
Authors have declared that no competing
interests exist.
We would like to clarify that there was no
external funding received for this research project. The manuscript was completed
without any financial support or grants. We would like to acknowledge that this
study was conducted independently and self-funded.
Ahmed, O. H., Ahmed, A. M., Ahmed, S. H. J. I.
J. o. E., & Science, C. (2015). Improving playfair algorithm to support
user verification and all the languages in the world including kurdish
language. 4(8), 14058-14062.
AL-Nasrawi,
D. A., Hashem, H. A., & Odhaib, M. A. J. E. C. S. J. (2014). Unicode text
editor for ancient Egyptian hieroglyphs writing system. 38(2), 48-55.
AL-Shakarchy,
N. D., AL-Shahad, H. F., & AL-Nasrawi, D. A. (2018). Cryptographic system
based on Unicode. Paper presented at the Journal of Physics: Conference Series.
Aleqabie,
H. J., Al-Nasrawi, D., Al-Shakarchy, N., Alshahad, H., & Abd, E. (2019).
New Cryptographic System of Romanized Arabic Text Based on Modified Playfiar. Journal
of Engineering and Applied Sciences, 14. doi:10.36478/jeasci.2019.1331.1338
Alkhudaydi,
M., & Gutub, A. J. S. C. S. (2021). Securing data via cryptography and
Arabic text steganography. 2, 1-18.
Ghauri,
F. (2021). DIGITAL SECURITY VERSUS PRIVATE INFORMATION.
Hawezi,
R. S., Azeez, M. Y., & Qadir, A. A. (2019). Spell checking algorithm for
agglutinative languages “Central Kurdish as an example”. Paper presented at the
2019 International Engineering Conference (IEC).
Kako, N.
A. (2018). Classical Cryptography for Kurdish Language. Paper presented at the
4th International Engineering Conference on Developments in Civil &
Computer Engineering Applications (IEC2018).
Kareem,
R. A. (2016). The syntax of verbal inflection in Central Kurdish. Newcastle
University,
Khairullah,
M., & Ratul, M. (2018). Steganography in Bengali Unicode Text. 27.
Korpela,
J. K. (2006). Unicode explained: " O'Reilly Media, Inc.".
Maghrebi,
H., Portigliatti, T., & Prouff, E. (2016). Breaking cryptographic
implementations using deep learning techniques. Paper presented at the
Security, Privacy, and Applied Cryptography Engineering: 6th International
Conference, SPACE 2016, Hyderabad, India, December 14-18, 2016, Proceedings 6.
Maram,
B., Gnanasekar, J., Manogaran, G., Balaanand, M. J. S. O. C., &
Applications. (2019). Intelligent security algorithm for UNICODE data privacy
and security in IOT. 13, 3-15.
Miltner,
K. M. J. N. M., & Society. (2021). “One part politics, one part technology,
one part history”: Racial representation in the Unicode 7.0 emoji set. 23(3),
515-534.
Pfleeger,
C. P., Pfleeger, S. L., & Margulies, J. (2015). Security in Computing:
Pearson Education.
Pyeatt,
L. (2016). Modern assembly language programming with the ARM processor: Newnes.
Rajasekharaiah,
K., Dule, C. S., & Sudarshan, E. (2020). Cyber security challenges and its
emerging trends on latest technologies. Paper presented at the IOP Conference
Series: Materials Science and Engineering.
Ramanathan,
A. J. J. o. O. S. S. (2022). Unishox: A hybrid encoder for short unicode
strings. 7(69), 3919.
Rashid,
F. J. A. a. S. (2020). Design and implementation a new approach for enhancing
encryption and decryption mechanisms.
Shareef,
S., & Arreymbi, J. (2013). E-Government Initiatives in Kurdistan Region of
Iraq: A Citizen-Centric Approach. In (pp. 1-33).
Shirali-Shahreza,
M., & Shirali-Shahreza, S. (2008, 8-10 Sept. 2008). Persian/Arabic Unicode
Text Steganography. Paper presented at the 2008 The Fourth International
Conference on Information Assurance and Security.
Stallings,
W. (2006). Cryptography and Network Security: Principles and Practice:
Pearson/Prentice Hall.
Tawfiq,
N. E. J. A. J. o. N. U. (2018). Modified Lsb For Hiding Encrypted Kurdish Text
Into Digital Image. 7(4), 254-260.
Thakur,
K., Qiu, M., Gai, K., & Ali, M. L. (2015). An investigation on cyber
security threats and security models. Paper presented at the 2015 IEEE 2nd
international conference on cyber security and cloud computing.