A NEW CONJUGATE GRADIENT METHOD BASED ON LOGISTIC MAPPING FOR UNCONSTRAINED OPTIMIZATION AND ITS APPLICATION IN REGRESSION ANALYSIS
Sarwar Ahmad Hamad 1, , Dlovan Haji Omar 1, Diman Abdulqader Sulaiman 1, Alaa Luqman Ibrahim 1,*
1 Department of Mathematics, College of Science, University of Zakho, Zakho, Kurdistan Region, Iraq
Corresponding Author Email: alaa.ibrahim@uoz.edu.krd
Received: 11 May. 2024 / Accepted: 28 Jul., 2024 / Published: 14 Nov., 2024. https://doi.org/10.25271/sjuoz.2024.12.3.1310
ABSTRACT:
The study tackles the critical need for efficient optimization techniques in unconstrained optimization problems, where conventional techniques often suffer from slow and inefficient convergence. There is still a need for algorithms that strike a balance between computational efficiency and robustness, despite advancements in gradient-based techniques. This work introduces a novel conjugate gradient algorithm based on the logistic mapping formula. As part of the methodology, descent conditions are established, and the suggested algorithm's global convergence properties are thoroughly examined. Comprehensive numerical experiments are used for empirical validation, and the new algorithm is compared to the Polak-Ribière-Polyak (PRP) algorithm. The suggested approach performs better than the PR algorithm, according to the results, and is more efficient since it needs fewer function evaluations and iterations to reach convergence. Furthermore, the usefulness of the suggested approach is demonstrated by its actual use in regression analysis, notably in the modelling of population estimates for the Kurdistan Region of Iraq. In contrast to conventional least squares techniques, the method maintains low relative error rates while producing accurate predictions. All things considered, this study presents the novel conjugate gradient algorithm as an effective tool for handling challenging optimisation problems in both theoretical and real-world contexts.
KEYWORDS: optimization; conjugate gradient; step size; regression analysis.
Unconstrained optimization issues entail minimizing an objective function that relies exclusively on the real variables, devoid of any restrictions on the variables' values. This can be expressed mathematically as:
(1.1)
where ,
, and
is a gradient at point
. The conjugate gradient method (CG) is an optimization
algorithm that lies between the steepest descent method and the Newton method
in terms of computational complexity and convergence behaviour. Unlike the
steepest descent method, which updates the solution solely in the direction of
the negative gradient, the (CG) method modifies this direction by incorporating
a positive linear combination of the previous search direction. This adjustment
helps to overcome the steepest descent method's limitation of slow convergence.
The (CG) method only requires the computation of first-order derivatives,
specifically the gradient of the objective function, which significantly
reduces computational cost compared to methods that require second-order
derivatives, such as the Newton method. The Newton method involves calculating
and inverting the Hessian matrix, a process that is computationally expensive
and often impractical for large-scale problems. In contrast, a key advantage of
the (CG) method is that it does not require the Hessian matrix or its
approximation, making it particularly well-suited for large-scale optimization
challenges.
The (CG) algorithm typically generates a sequence as:
, (1.2)
where is the step length.
is a search direction given by:
and
, (1.3)
the parameter is crucial, with different choices leading to
various CG methods. Over the years, numerous variants of this scheme have been
proposed and widely applied in practice, such as the Fletcher-Reeves (FR), (Fletcher & Reeves, 1964), Polak-Ribière-Polyak (PRP), (Polak & Ribiere, 1969; Polyak B. T.,
1969), Hestenes-Stiefel (HS), (Hestenes & Stiefel, 1952), Liu-Storey (LS), (Liu & Storey, 1991), Dai-Yuan (DY), (Dai & Yuan, 1999), and Conjugate-Descent (CD), (Fletcher, 1987)methods.
|
Fletcher-Reeves (1964). |
|
Polak-Ribière-Polyak (1969). |
|
Hestenes-Stiefel (1952). |
|
Lia-Storey (1991). |
|
Dia-Yuan (1999). |
|
Conjugate Descent (1987). |
Numerous researchers have dedicated their efforts to refining (CG) methods, motivated by their widespread adoption in solving optimization problems, as well as their properties of global convergence and low memory utilization. Interestingly, most of these efforts are directed towards improving conventional CG methods, which are the first generation of CG algorithms. For more details, see (Ibrahim & Mohammed, 2022, 2024; Ibrahim & Shareef, 2019; Jahwar et al., 2024; Shareef & Ibrahim, 2016).
Since classical CG methods have well-established
convergence properties and fundamental principles, they serve as the basis for
all subsequent variants. In spite of their success, these methods have
stimulated a wealth of research studies focused on particular issues, like
enhancing robustness, scalability, and convergence rates for high-dimensional
problems. Since the gradients in these approaches are mutually orthogonal and
the parameters are similar, they are equivalent when
is a highly convex quadratic function and the line
search is exact. However, their behaviour differs significantly when applied to
broad nonlinear functions with imprecise line searches. Although the strong
convergent features of FR, DY, and CD algorithms are well known, jamming may
cause them to perform poorly in real-world scenarios. Furthermore, PRP, HS, and
LS approaches often outperform one another even if they might not converge in
general.
Naturally, researchers strive to develop new techniques that combine the best features of these two categories. Numerous hybrid approaches have been proposed thus far. For instance, Touati-Ahmed and Storey (Touati-Ahmed & Storey, 1990) originally introduced a hybrid conjugate gradient algorithm that combines the FR and PRP techniques in 1990. Subsequently, Hu and Storey (Hu & Storey, 1991), along with Gilbert and Nocedal (Gilbert & Nocedal, 1992), explored other hybrid systems related to the PRP and FR techniques. Dai and Yuan (Dai & Yuan, 2001) combined the DY technique with the HS method to create two hybrid CG algorithm aimed at enhancing the practical application of the DY method. For large-scale problems in unconstrained optimization, Andrei (Andrei, 2008) presented a novel hybrid CG approach, referred to as the HYBRID algorithm, which is based on the HS and DY methods. A primary characteristic of this hybrid method is that the search direction is the Newton direction. Remarkably, this hybrid technique often out performs certain complex conjugate gradient methods in various applications.
This research focuses on how new improvements in conjugate gradient methods can contribute to enhanced performance of optimization algorithms. Specifically, how can techniques such as logistic mapping enhance these methods.
The objective of this study is to develop a new conjugate gradient method incorporating logistic mapping and to analyze its performance compared to traditional methods. The focus is on improving convergence rates and robustness, particularly in applications related to regression analysis.
The motivation behind this study is to address the need for more efficient optimization techniques capable of handling large-scale problems and real-world scenarios effectively.
The challenges include ensuring the robustness of the new method, addressing convergence issues, and validating its effectiveness in practical applications.
This study highlights how logistic mapping can improve conjugate gradient methods, offering new insights into optimization by enhancing convergence rates and expanding practical applications.
The structure of the paper is as follows: In
next section, we present our particular technique and several methods we used
to determine the parameter . Under appropriate conditions, the descent and
adequate descent properties of the suggested approach are also covered with the
global convergence. In Section 3, preliminary numerical data are shown. We
create a summary of our article in the end.
This section presents a new (CG) method for solving (1.1). The CG parameter of the algorithm is based on the logistic mapping formula (Lu et al., 2006), which is widely used in optimization. By utilizing the logistic mapping along with the CG parameter from Polak-Ribière-Polyak (PRP), the algorithm's performance can be enhanced., From logistic mapping formula we have
, (2.1)
to achieve balance, multiplying the second term of the equation (2.1), by scalar we get:
, (2.2)
where and
After some algebraic operations, we get:
. (2.3)
Step (1): Initialization
Begin with an initial point .
Step (2): Initialization and Gradient Computation
Set and compute the gradient
Define the search direction
If , terminate the algorithm.
Step (3): Line Search
Determine the step length , to minimize the
objective function by using cubic line
search
Step (5): Update
Update the iterate:
Step (6): Gradient Update and Termination Check
Compute If:
,
then stop
Step (7): Conjugacy Update
Calculate based on a specific rule (2.3)
Step (8): Conjugate Direction Update
Update the search direction:
,
Step (9): Convergence Check
If or if
return to
Step 2; otherwise, increment k and return to Step 3
Theorem 1. Let and
, be two sequences generated by new method. Then the
satisfies the:
Proof: Using (1.3) and (2.3), we've obtained
, (2.4)
by multiplying the aforementioned equation by , on both sides, we obtain:
, (2.5)
when the is determined through an exact line search, yielding
, the resultant equation
.
the descent condition validated.
In the case of an inexact line search, where . Since we have
. So, we get
, (2.6)
The affirmation that the initial two terms of equation (2.6) adhere to non-positivity stems from the fulfilment of the descent condition by the Polak-Ribière-Polyak (PRP) algorithm. This condition is a fundamental criterion in optimization theory, ensuring that iterative algorithms make progress towards minimizing the objective function.
Since the search direction of (PRP) method represented the descent condition as
.,
It signifies that the chosen descent directions align with the objective of function minimization, thereby facilitating convergence. This condition serves as a crucial safeguard against divergent behaviour and ensures the convergence trajectory of the algorithm.
The adherence of the PRP algorithm to the descent condition engenders confidence in subsequent analytical derivations, particularly in equation (2.6). By establishing the negativity of the initial terms, the proof establishes a robust foundation for affirming the convergence of the CG method, even in scenarios involving inexact line search methodologies.
And it is obviously that
and
are positive, so, we get to the third term of equation
(2.6), which is less than or equal to zero. Hence, we get
.
The descent condition is proved.
Theorem 2. Let and
be two sequences generated by new method. Then the
satisfies the sufficient descent condition:
, for any
.
Proof: The two initial terms of equation (2.6) are demonstrably less than or equal to zero due to the properties of the Polak-Ribière-Polyak (PRP) algorithm, which achieve the descent condition. Therefore, we obtain:
,
let .
Then, .
In this section, we establish the global convergence of the new method by relying on the following basic assumptions regarding the objective function.
Assumption (*): (Zoutendijk, 1970),
1. Lower Bound: The objective function is bounded
below on
.
The
level set
is bounded.
2. Continuity and Differentiability:
The objective function is
continuously differentiable on
.
3. Gradient
Bound: In some neighbourhood of
is
continuously differentiable, and its gradient is Lipschitz
continuous with Lipschitz constant
, i.e.
.
From the above assumptions, that there
exists a positive constant such
that
(2.7)
If is a uniformly convex,
such that:
, (2.8)
we can rewrite the above equation in the following manner:
, (2.9)
These assumptions are standard in optimization theory and provide a foundation for demonstrating the convergence properties of the proposed method. Specifically, the continuity and differentiability assumption ensure the smoothness of the objective function, while the Lipschitz condition on the gradient guarantees that changes in the gradient are controlled. Finally, the lower bound condition ensures that the optimization process does not diverge to −∞, thus supporting the argument for global convergence.
Lemma 1. (Zhang et al., 2006). Assuming the aforementioned conditions
hold. Consider the methods (1.2) and (1.3), where is a descent direction and
satisfies the standard Wolfe line search. If
then,
Theorem 3. If the sequences ,
,
, are created by our algorithm, the assumptions (*)
hold, the following properties can be established:
Proof: From equations (1.3) and (2.3), we have
, (2.10)
,
since,
, (2.11)
from Lipschitz Condition and by using (2.9), we get
,
,
Since,,
.
Hence,
,
,
,
By using lemma 1, we get .The proof complete.
This section presents the numerical results of the new method for unconstrained optimization and its application to regression analysis.
3.1 The Unconstrained Optimization
The objective of this subsection is to assess the efficacy of our
novel method in addressing optimization challenges, juxtaposed against the
Polak-Ribière-Polyak (PRP), method. We employ a comparative analysis, utilizing
well-established nonlinear problems characterized by varying dimensionalities
(where ()). Each computational implementation is meticulously
crafted in FORTRAN 95 to ensure precision and reliability. Central to our
evaluation is the utilization of the cubic interpolation method within the line
search procedure. This method leverages both function and gradient variables to
navigate the optimization landscape effectively. Key performance metrics,
namely the (NOI) and the (NOF), are meticulously documented and explicitly
presented in the results table (Table 1). The experimental results, as
delineated in Table 2, underscore the superiority of our proposed technique
over the PRP method. This superiority is evident in terms of both NOF and NOI,
reaffirming the efficacy and efficiency of our novel approach in comparison to
the established PRP method. This comprehensive evaluation framework not only
provides empirical validation of our method's effectiveness but also
underscores its potential to advance the state-of-the-art in optimization
methodologies.
Table 1: The results of the new method compared with the Polak-Ribière-Polyak (PRP) method.
Test Function |
|
|
|
||
NOI |
NOF |
NOI |
NOF |
||
Wolfe |
5 10 100 500 1000 5000 |
14 32 49 58 64 99 |
29 65 99 117 129 214 |
12 25 42 48 48 83 |
20 50 80 90 90 161 |
Mile |
5 10 100 500 1000 5000 |
37 37 44 44 50 50 |
116 116 148 148 180 180 |
29 29 29 29 50 50 |
89 89 89 89 180 180 |
Central |
5 10 100 500 1000 5000 |
22 22 22 23 23 30 |
159 159 159 171 171 270 |
22 22 22 23 23 22 |
159 159 159 170 170 159 |
Powell |
5 10 100 500 1000 5000 |
40 40 43 46 46 50 |
120 120 135 150 150 180 |
30 32 32 32 38 38 |
80 90 90 90 97 97 |
Sum |
5 10 100 500 1000 5000 |
6 6 14 21 23 31 |
39 34 80 123 127 145 |
6 6 11 17 17 25 |
39 34 59 86 86 120 |
Wood |
5 10 100 500 1000 5000 |
29 29 30 30 30 30 |
67 67 69 69 69 69 |
29 29 30 30 30 30 |
67 67 69 69 69 69 |
Cubic |
4 10 100 500 1000 5000 |
15 16 16 16 16 16 |
45 47 47 47 47 47 |
14 14 14 14 15 15 |
39 39 39 39 43 43 |
Rosen |
4 10 100 500 1000 5000 |
30 30 30 30 30 30 |
85 85 85 85 85 85 |
28 28 28 28 28 28 |
65 65 65 65 65 65 |
Total |
1539 |
5233 |
1324 |
4193 |
Table 2: The improvement percentage of the new method compared with the Polak-Ribière-Polyak (PRP)
method.
Tools |
PRP |
New |
NOI |
100% |
86.02989 % |
NOF |
100% |
80.12612268 % |
3.2 Application of the New Method to Regression Analysis
Regression analysis is a crucial statistical method frequently utilized in areas such as accounting, economics, management, physics, finance, and beyond (Christensen, 1996; Vandeginste, 1989). It is employed to examine the relationship between independent and dependent variables within different datasets. The purpose of regression analysis can be outlined as follows:
where is the predictor,
is the response variable, and
is the error. The linear regression function is
derived such that
This method is typically applied when the relationship between and
can be represented by a straight line, although such
cases are rare. As a result, nonlinear regression models are often employed.
This paper focuses on the nonlinear regression approach.
This subsection provides a detailed overview of population estimates for
the Kurdistan Region of Iraq (KRI) from 1965 to 2020. The statistics in Table 3
are sourced from the data collected by the Kurdistan Region Statistics Office,
Ministry of Planning, Kurdistan Regional Government (Kurdistan Region Statistics Office,
Ministry of Planning, 2021).
In this analysis, the years of data collection are represented by the -variable, while the population figures for KRI serve
as the
-variable. Data from 1965 to 2014 will be used for
fitting the model, while the data from 2020 will be reserved for error
analysis.
Table 3: Population estimates of KRI.
Years |
KRI Population |
1965 |
902,000 |
1987 |
2,015,466 |
1997 |
2,861,701 |
2014 |
5,332,600 |
2020 |
6,171,083 |
The approximate function for the nonlinear least squares method is derived using the data in above Table as follows:
. (3.1)
The function (3.1) is use to approximate the value of based on value of
from 2014 - 2017. Let
denotes number of years and
be the recorded cases of drug addicts. Then, the
above least squares method (3.1) is transformed
into the following unconstrained minimization problems:
,
(3.2)
Data from 1965 to 2014 are used to develop the nonlinear quadratic model
using the least squares method, along with the associated test function for the
unconstrained optimization problem. It is evident from this analysis that the and the value of
values exhibit a parabolic relationship, as
described by the regression function defined in (3.2)
with the regression parameters
and
.
However, the data for 2020 is excluded from the unconstrained optimization function to facilitate the calculation of relative errors for the predicted data. Consequently, the proposed method is used to solve the test function (3.2) utilizing the strong Wolfe line search technique, with the results presented in Table 4.
Table 4: Test results for optimization of quadratic model using new method.
Initial Points |
No. for Iteration |
CPU Time |
(1,1,1) |
13 |
0.015291 |
(2,2,2) |
13 |
0.028205 |
(10,10,10) |
13 |
0.015354 |
(15,15,15) |
15 |
0.023973 |
Table 5 displays the relative error of the new method compared to the least squares method. A smaller relative error value indicates greater accuracy and a better fit to the observed dataset.
Table 5: Estimation point and relative errors for 2020 data.
Models |
Point |
Relative Error |
New |
6195047.5029 |
0.0039 |
Least Square |
6375777.5441 |
0.0332 |
This work introduces a novel (CG) method that based on the logistic mapping formula to enhance optimization problem-solving. The proposed method satisfies both descent and sufficient descent conditions, with the latter being a stronger criterion that significantly improves numerical performance. A comprehensive analysis of the global convergence properties confirms the method's effectiveness, establishing it as a robust approach in the optimization filed. Numerical experiments demonstrate that the new method outperforms the traditional Polak-Ribière-Polyak CG method, particularly in terms of Numbers of Iterations (NOI) and Numbers of Function evaluations (NOF). The results indicate that the proposed algorithm achieves superior convergence rates and requires fewer computational resources, making it especially valuable for large-scale problems with dimensions ranging from 10 to 5000. Furthermore, the application of the new method to regression analysis, specifically in modelling population estimates for the Kurdistan Region of Iraq, reveals its practical utility. The model produced accurate predictions while maintaining low relative error rates compared to traditional least squares methods. Future research may explore additional refinements and broader applications to further enhance its effectiveness in complex optimization challenges.
Andrei, N. (2008). Another hybrid conjugate gradient algorithm for unconstrained optimization. Numerical Algorithms, 47(2), 143–156. https://doi.org/10.1007/s11075-007-9152-9
Christensen, R. (1996). Analysis of Variance, Design, and Regression: Applied Statistical Methods. CRC Press, Chapman and Hall.
Dai, Y. H., & Yuan, Y. (1999). A Nonlinear Conjugate Gradient Method with a Strong Global Convergence Property. SIAM Journal on Optimization, 10(1), 177–182. https://doi.org/10.1137/S1052623497318992
Dai, Y. H., & Yuan, Y. (2001). An Efficient Hybrid Conjugate Gradient Method for Unconstrained Optimization. Annals of Operations Research, 103(1), 33–47. https://doi.org/10.1023/A:1012930416777
Fletcher, R. (1987). Practical Methods of Optimization, Unconstrained Optimization (Vol. 1). John Wiley and Sons.
Fletcher, R., & Reeves, C. M. (1964). Function minimization by conjugate gradients. The Computer Journal, 7(2), 149–154. https://doi.org/10.1093/comjnl/7.2.149
Gilbert, J. C., & Nocedal, J. (1992). Global Convergence Properties of Conjugate Gradient Methods for Optimization. SIAM Journal on Optimization, 2(1), 21–42. https://doi.org/10.1137/0802003
Hestenes, M. R., & Stiefel, E. (1952). Methods of conjugate gradients for solving linear systems. Journal of Research of the National Bureau of Standards, 49(6), 409. https://doi.org/10.6028/jres.049.044
Hu, Y. F., & Storey, C. (1991). Global convergence result for conjugate gradient methods. Journal of Optimization Theory and Applications, 71(2), 399–405. https://doi.org/10.1007/BF00939927
Ibrahim, A. L., & Mohammed, M. G. (2022). A new three-term conjugate gradient method for training neural networks with global convergence. Indonesian Journal of Electrical Engineering and Computer Science, 28(1), 551–558. https://doi.org/10.11591/ijeecs.v28.i1.pp551-558
Ibrahim, A. L., & Mohammed, M. G. (2024). A new conjugate gradient for unconstrained optimization problems and its applications in neural networks. Indonesian Journal of Electrical Engineering and Computer Science, 33(1), 93–100. https://doi.org/10.11591/ijeecs.v33.i1.pp93-100
Ibrahim, A. L., & Shareef, S. G. (2019). A new class of three-term conjugate Gradient methods for solving unconstrained minimization problems. General Letters in Mathematics, 7(2). https://doi.org/10.31559/glm2019.7.2.4
Jahwar, B. H., Ibrahim, A. L., Ajeel, S. M., & Shareef, S. G. (2024). Two new classes of conjugate gradient method based on logistic mapping. Telkomnika (Telecommunication Computing Electronics and Control), 22(1), 86–94. https://doi.org/10.12928/TELKOMNIKA.v22i1.25264
Kurdistan Region Statistics Office, Ministry of Planning, K. R. G. (2021). Population Statistics. https://krso.gov.krd/en/statistics/population
Liu, Y., & Storey, C. (1991). Efficient generalized conjugate gradient algorithms, part 1: Theory. Journal of Optimization Theory and Applications, 69(1), 129–137. https://doi.org/10.1007/BF00940464
Lu, H., Zhang, H., & Ma, L. (2006). New optimization algorithm based on chaos. Journal of Zhejiang University - Science A: Applied Physics & Engineering, 7, 539–542. https://doi.org/10.1631/jzus.2006.A0539
Polak, E., & Ribiere, G. (1969). Note sur la convergence de méthodes de directions conjuguées. Revue Française d’informatique et de Recherche Opérationnelle. Série Rouge, 3(R1), 35–43. http://www.numdam.org/item/M2AN_1969__3_1_35_0/
Polyak B. T. (1969). The Conjugate Gradient Method in Extremal Problems. Zh. Vychisl. Mat. Mat. Fiz., 9(4), 807–821.
Shareef, S. G., & Ibrahim, A. L. (2016). A New Conjugate Gradient for Unconstrained Optimization Based on Step Size of Barzilai and Borwein. Science Journal of University of Zakho, 4(1 SE-), 104–114. https://sjuoz.uoz.edu.krd/index.php/sjuoz/article/view/311
Touati-Ahmed, D., & Storey, C. (1990). Efficient hybrid conjugate gradient techniques. Journal of Optimization Theory and Applications, 64(2), 379–397. https://doi.org/10.1007/BF00939455
Vandeginste, B. G. M. (1989). Nonlinear regression analysis: Its applications, D. M. Bates and D. G. Watts, Wiley, New York, 1988. ISBN 0471‐816434. Price: £34.50. Journal of Chemometrics, 3, 544–545. https://api.semanticscholar.org/CorpusID:120419559
Zhang, L., Zhou, W., & Li, D.-H. (2006). A descent modified Polak–Ribière–Polyak conjugate gradient method and its global convergence. IMA Journal of Numerical Analysis, 26(4), 629–640. https://doi.org/10.1093/imanum/drl016
Zoutendijk, G. (1970). Nonlinear Programming Computational Methods. In: Abadie, J. Ed., Integer and Nonlinear Programming, NorthHolllad, Amsterdam, 37-86.