A NEW CONJUGATE GRADIENT METHOD BASED ON LOGISTIC MAPPING FOR UNCONSTRAINED OPTIMIZATION AND ITS APPLICATION IN REGRESSION ANALYSIS
Sarwar Ahmad Hamad 1, , Dlovan Haji Omar 1, Diman Abdulqader Sulaiman 1, Alaa Luqman Ibrahim 1,*
1 Department of Mathematics, College of Science, University of Zakho, Zakho, Kurdistan Region, Iraq
Corresponding Author Email: alaa.ibrahim@uoz.edu.krd
Received: 11 May. 2024 / Accepted: 28 Jul., 2024 / Published: 14 Nov., 2024. https://doi.org/10.25271/sjuoz.2024.12.3.1310
ABSTRACT:
The study tackles the critical need for efficient optimization techniques in unconstrained optimization problems, where conventional techniques often suffer from slow and inefficient convergence. There is still a need for algorithms that strike a balance between computational efficiency and robustness, despite advancements in gradient-based techniques. This work introduces a novel conjugate gradient algorithm based on the logistic mapping formula. As part of the methodology, descent conditions are established, and the suggested algorithm's global convergence properties are thoroughly examined. Comprehensive numerical experiments are used for empirical validation, and the new algorithm is compared to the Polak-Ribière-Polyak (PRP) algorithm. The suggested approach performs better than the PR algorithm, according to the results, and is more efficient since it needs fewer function evaluations and iterations to reach convergence. Furthermore, the usefulness of the suggested approach is demonstrated by its actual use in regression analysis, notably in the modelling of population estimates for the Kurdistan Region of Iraq. In contrast to conventional least squares techniques, the method maintains low relative error rates while producing accurate predictions. All things considered, this study presents the novel conjugate gradient algorithm as an effective tool for handling challenging optimisation problems in both theoretical and real-world contexts.
KEYWORDS: optimization; conjugate gradient; step size; regression analysis.
Unconstrained optimization issues entail minimizing an objective function that relies exclusively on the real variables, devoid of any restrictions on the variables' values. This can be expressed mathematically as:
(1.1)
where , , and is a gradient at point . The conjugate gradient method (CG) is an optimization algorithm that lies between the steepest descent method and the Newton method in terms of computational complexity and convergence behaviour. Unlike the steepest descent method, which updates the solution solely in the direction of the negative gradient, the (CG) method modifies this direction by incorporating a positive linear combination of the previous search direction. This adjustment helps to overcome the steepest descent method's limitation of slow convergence. The (CG) method only requires the computation of first-order derivatives, specifically the gradient of the objective function, which significantly reduces computational cost compared to methods that require second-order derivatives, such as the Newton method. The Newton method involves calculating and inverting the Hessian matrix, a process that is computationally expensive and often impractical for large-scale problems. In contrast, a key advantage of the (CG) method is that it does not require the Hessian matrix or its approximation, making it particularly well-suited for large-scale optimization challenges.
The (CG) algorithm typically generates a sequence as:
, (1.2)
where is the step length. is a search direction given by:
and
, (1.3)
the parameter is crucial, with different choices leading to various CG methods. Over the years, numerous variants of this scheme have been proposed and widely applied in practice, such as the Fletcher-Reeves (FR), (Fletcher & Reeves, 1964), Polak-Ribière-Polyak (PRP), (Polak & Ribiere, 1969; Polyak B. T., 1969), Hestenes-Stiefel (HS), (Hestenes & Stiefel, 1952), Liu-Storey (LS), (Liu & Storey, 1991), Dai-Yuan (DY), (Dai & Yuan, 1999), and Conjugate-Descent (CD), (Fletcher, 1987)methods.
, |
Fletcher-Reeves (1964). |
, |
Polak-Ribière-Polyak (1969). |
, |
Hestenes-Stiefel (1952). |
|
Lia-Storey (1991). |
, |
Dia-Yuan (1999). |
, |
Conjugate Descent (1987). |
Numerous researchers have dedicated their efforts to refining (CG) methods, motivated by their widespread adoption in solving optimization problems, as well as their properties of global convergence and low memory utilization. Interestingly, most of these efforts are directed towards improving conventional CG methods, which are the first generation of CG algorithms. For more details, see (Ibrahim & Mohammed, 2022, 2024; Ibrahim & Shareef, 2019; Jahwar et al., 2024; Shareef & Ibrahim, 2016).
Since classical CG methods have well-established convergence properties and fundamental principles, they serve as the basis for all subsequent variants. In spite of their success, these methods have stimulated a wealth of research studies focused on particular issues, like enhancing robustness, scalability, and convergence rates for high-dimensional problems. Since the gradients in these approaches are mutually orthogonal and the parameters are similar, they are equivalent when is a highly convex quadratic function and the line search is exact. However, their behaviour differs significantly when applied to broad nonlinear functions with imprecise line searches. Although the strong convergent features of FR, DY, and CD algorithms are well known, jamming may cause them to perform poorly in real-world scenarios. Furthermore, PRP, HS, and LS approaches often outperform one another even if they might not converge in general.
Naturally, researchers strive to develop new techniques that combine the best features of these two categories. Numerous hybrid approaches have been proposed thus far. For instance, Touati-Ahmed and Storey (Touati-Ahmed & Storey, 1990) originally introduced a hybrid conjugate gradient algorithm that combines the FR and PRP techniques in 1990. Subsequently, Hu and Storey (Hu & Storey, 1991), along with Gilbert and Nocedal (Gilbert & Nocedal, 1992), explored other hybrid systems related to the PRP and FR techniques. Dai and Yuan (Dai & Yuan, 2001) combined the DY technique with the HS method to create two hybrid CG algorithm aimed at enhancing the practical application of the DY method. For large-scale problems in unconstrained optimization, Andrei (Andrei, 2008) presented a novel hybrid CG approach, referred to as the HYBRID algorithm, which is based on the HS and DY methods. A primary characteristic of this hybrid method is that the search direction is the Newton direction. Remarkably, this hybrid technique often out performs certain complex conjugate gradient methods in various applications.
This research focuses on how new improvements in conjugate gradient methods can contribute to enhanced performance of optimization algorithms. Specifically, how can techniques such as logistic mapping enhance these methods.
The objective of this study is to develop a new conjugate gradient method incorporating logistic mapping and to analyze its performance compared to traditional methods. The focus is on improving convergence rates and robustness, particularly in applications related to regression analysis.
The motivation behind this study is to address the need for more efficient optimization techniques capable of handling large-scale problems and real-world scenarios effectively.
The challenges include ensuring the robustness of the new method, addressing convergence issues, and validating its effectiveness in practical applications.
This study highlights how logistic mapping can improve conjugate gradient methods, offering new insights into optimization by enhancing convergence rates and expanding practical applications.
The structure of the paper is as follows: In next section, we present our particular technique and several methods we used to determine the parameter . Under appropriate conditions, the descent and adequate descent properties of the suggested approach are also covered with the global convergence. In Section 3, preliminary numerical data are shown. We create a summary of our article in the end.
This section presents a new (CG) method for solving (1.1). The CG parameter of the algorithm is based on the logistic mapping formula (Lu et al., 2006), which is widely used in optimization. By utilizing the logistic mapping along with the CG parameter from Polak-Ribière-Polyak (PRP), the algorithm's performance can be enhanced., From logistic mapping formula we have
, (2.1)
to achieve balance, multiplying the second term of the equation (2.1), by scalar we get:
, (2.2)
where and
After some algebraic operations, we get:
. (2.3)
Step (1): Initialization
Begin with an initial point .
Step (2): Initialization and Gradient Computation
Set and compute the gradient
Define the search direction
If , terminate the algorithm.
Step (3): Line Search
Determine the step length , to minimize the
objective function by using cubic line
search
Step (5): Update
Update the iterate:
Step (6): Gradient Update and Termination Check
Compute If: ,
then stop
Step (7): Conjugacy Update
Calculate based on a specific rule (2.3)
Step (8): Conjugate Direction Update
Update the search direction:
,
Step (9): Convergence Check
If or if return to
Step 2; otherwise, increment k and return to Step 3
Theorem 1. Let and , be two sequences generated by new method. Then the satisfies the:
Proof: Using (1.3) and (2.3), we've obtained
, (2.4)
by multiplying the aforementioned equation by , on both sides, we obtain:
, (2.5)
when the is determined through an exact line search, yielding , the resultant equation
.
the descent condition validated.
In the case of an inexact line search, where . Since we have . So, we get
, (2.6)
The affirmation that the initial two terms of equation (2.6) adhere to non-positivity stems from the fulfilment of the descent condition by the Polak-Ribière-Polyak (PRP) algorithm. This condition is a fundamental criterion in optimization theory, ensuring that iterative algorithms make progress towards minimizing the objective function.
Since the search direction of (PRP) method represented the descent condition as
.,
It signifies that the chosen descent directions align with the objective of function minimization, thereby facilitating convergence. This condition serves as a crucial safeguard against divergent behaviour and ensures the convergence trajectory of the algorithm.
The adherence of the PRP algorithm to the descent condition engenders confidence in subsequent analytical derivations, particularly in equation (2.6). By establishing the negativity of the initial terms, the proof establishes a robust foundation for affirming the convergence of the CG method, even in scenarios involving inexact line search methodologies.
And it is obviously that andare positive, so, we get to the third term of equation (2.6), which is less than or equal to zero. Hence, we get
.
The descent condition is proved.
Theorem 2. Let and be two sequences generated by new method. Then the satisfies the sufficient descent condition:
, for any .
Proof: The two initial terms of equation (2.6) are demonstrably less than or equal to zero due to the properties of the Polak-Ribière-Polyak (PRP) algorithm, which achieve the descent condition. Therefore, we obtain:
,
let .
Then, .
In this section, we establish the global convergence of the new method by relying on the following basic assumptions regarding the objective function.
Assumption (*): (Zoutendijk, 1970),
1. Lower Bound: The objective function is bounded below on . The level set is bounded.
2. Continuity and Differentiability: The objective function is continuously differentiable on .
3. Gradient Bound: In some neighbourhood of is continuously differentiable, and its gradient is Lipschitz continuous with Lipschitz constant , i.e.
.
From the above assumptions, that there exists a positive constant such that
(2.7)
If is a uniformly convex, such that:
, (2.8)
we can rewrite the above equation in the following manner:
, (2.9)
These assumptions are standard in optimization theory and provide a foundation for demonstrating the convergence properties of the proposed method. Specifically, the continuity and differentiability assumption ensure the smoothness of the objective function, while the Lipschitz condition on the gradient guarantees that changes in the gradient are controlled. Finally, the lower bound condition ensures that the optimization process does not diverge to −∞, thus supporting the argument for global convergence.
Lemma 1. (Zhang et al., 2006). Assuming the aforementioned conditions hold. Consider the methods (1.2) and (1.3), where is a descent direction and satisfies the standard Wolfe line search. If
then,
Theorem 3. If the sequences , , , are created by our algorithm, the assumptions (*) hold, the following properties can be established:
Proof: From equations (1.3) and (2.3), we have
, (2.10)
,
since,
, (2.11)
from Lipschitz Condition and by using (2.9), we get
,
,
Since,, .
Hence,
,
,
,
By using lemma 1, we get .The proof complete.
This section presents the numerical results of the new method for unconstrained optimization and its application to regression analysis.
3.1 The Unconstrained Optimization
The objective of this subsection is to assess the efficacy of our novel method in addressing optimization challenges, juxtaposed against the Polak-Ribière-Polyak (PRP), method. We employ a comparative analysis, utilizing well-established nonlinear problems characterized by varying dimensionalities (where ()). Each computational implementation is meticulously crafted in FORTRAN 95 to ensure precision and reliability. Central to our evaluation is the utilization of the cubic interpolation method within the line search procedure. This method leverages both function and gradient variables to navigate the optimization landscape effectively. Key performance metrics, namely the (NOI) and the (NOF), are meticulously documented and explicitly presented in the results table (Table 1). The experimental results, as delineated in Table 2, underscore the superiority of our proposed technique over the PRP method. This superiority is evident in terms of both NOF and NOI, reaffirming the efficacy and efficiency of our novel approach in comparison to the established PRP method. This comprehensive evaluation framework not only provides empirical validation of our method's effectiveness but also underscores its potential to advance the state-of-the-art in optimization methodologies.
Table 1: The results of the new method compared with the Polak-Ribière-Polyak (PRP) method.
Test Function |
|
P |
|
||
NOI |
NOF |
NOI |
NOF |
||
Wolfe |
5 10 100 500 1000 5000 |
14 32 49 58 64 99 |
29 65 99 117 129 214 |
12 25 42 48 48 83 |
20 50 80 90 90 161 |
Mile |
5 10 100 500 1000 5000 |
37 37 44 44 50 50 |
116 116 148 148 180 180 |
29 29 29 29 50 50 |
89 89 89 89 180 180 |
Central |
5 10 100 500 1000 5000 |
22 22 22 23 23 30 |
159 159 159 171 171 270 |
22 22 22 23 23 22 |
159 159 159 170 170 159 |
Powell |
5 10 100 500 1000 5000 |
40 40 43 46 46 50 |
120 120 135 150 150 180 |
30 32 32 32 38 38 |
80 90 90 90 97 97 |
Sum |
5 10 100 500 1000 5000 |
6 6 14 21 23 31 |
39 34 80 123 127 145 |
6 6 11 17 17 25 |
39 34 59 86 86 120 |
Wood |
5 10 100 500 1000 5000 |
29 29 30 30 30 30 |
67 67 69 69 69 69 |
29 29 30 30 30 30 |
67 67 69 69 69 69 |
Cubic |
4 10 100 500 1000 5000 |
15 16 16 16 16 16 |
45 47 47 47 47 47 |
14 14 14 14 15 15 |
39 39 39 39 43 43 |
Rosen |
4 10 100 500 1000 5000 |
30 30 30 30 30 30 |
85 85 85 85 85 85 |
28 28 28 28 28 28 |
65 65 65 65 65 65 |
Total |
1539 |
5233 |
1324 |
4193 |
Table 2: The improvement percentage of the new method compared with the Polak-Ribière-Polyak (PRP)
method.
Tools |
PRP |
New |
NOI |
100% |
86.02989 % |
NOF |
100% |
80.12612268 % |
3.2 Application of the New Method to Regression Analysis
Regression analysis is a crucial statistical method frequently utilized in areas such as accounting, economics, management, physics, finance, and beyond (Christensen, 1996; Vandeginste, 1989). It is employed to examine the relationship between independent and dependent variables within different datasets. The purpose of regression analysis can be outlined as follows:
where is the predictor, is the response variable, and is the error. The linear regression function is derived such that
This method is typically applied when the relationship between and can be represented by a straight line, although such cases are rare. As a result, nonlinear regression models are often employed. This paper focuses on the nonlinear regression approach.
This subsection provides a detailed overview of population estimates for the Kurdistan Region of Iraq (KRI) from 1965 to 2020. The statistics in Table 3 are sourced from the data collected by the Kurdistan Region Statistics Office, Ministry of Planning, Kurdistan Regional Government (Kurdistan Region Statistics Office, Ministry of Planning, 2021). In this analysis, the years of data collection are represented by the -variable, while the population figures for KRI serve as the -variable. Data from 1965 to 2014 will be used for fitting the model, while the data from 2020 will be reserved for error analysis.
Table 3: Population estimates of KRI.
Years |
KRI Population |
1965 |
902,000 |
1987 |
2,015,466 |
1997 |
2,861,701 |
2014 |
5,332,600 |
2020 |
6,171,083 |
The approximate function for the nonlinear least squares method is derived using the data in above Table as follows:
. (3.1)
The function (3.1) is use to approximate the value of based on value of from 2014 - 2017. Let denotes number of years and be the recorded cases of drug addicts. Then, the above least squares method (3.1) is transformed into the following unconstrained minimization problems:
, (3.2)
Data from 1965 to 2014 are used to develop the nonlinear quadratic model using the least squares method, along with the associated test function for the unconstrained optimization problem. It is evident from this analysis that the and the value of values exhibit a parabolic relationship, as described by the regression function defined in (3.2) with the regression parameters and .
However, the data for 2020 is excluded from the unconstrained optimization function to facilitate the calculation of relative errors for the predicted data. Consequently, the proposed method is used to solve the test function (3.2) utilizing the strong Wolfe line search technique, with the results presented in Table 4.
Table 4: Test results for optimization of quadratic model using new method.
Initial Points |
No. for Iteration |
CPU Time |
(1,1,1) |
13 |
0.015291 |
(2,2,2) |
13 |
0.028205 |
(10,10,10) |
13 |
0.015354 |
(15,15,15) |
15 |
0.023973 |
Table 5 displays the relative error of the new method compared to the least squares method. A smaller relative error value indicates greater accuracy and a better fit to the observed dataset.
Table 5: Estimation point and relative errors for 2020 data.
Models |
Point |
Relative Error |
New |
6195047.5029 |
0.0039 |
Least Square |
6375777.5441 |
0.0332 |
This work introduces a novel (CG) method that based on the logistic mapping formula to enhance optimization problem-solving. The proposed method satisfies both descent and sufficient descent conditions, with the latter being a stronger criterion that significantly improves numerical performance. A comprehensive analysis of the global convergence properties confirms the method's effectiveness, establishing it as a robust approach in the optimization filed. Numerical experiments demonstrate that the new method outperforms the traditional Polak-Ribière-Polyak CG method, particularly in terms of Numbers of Iterations (NOI) and Numbers of Function evaluations (NOF). The results indicate that the proposed algorithm achieves superior convergence rates and requires fewer computational resources, making it especially valuable for large-scale problems with dimensions ranging from 10 to 5000. Furthermore, the application of the new method to regression analysis, specifically in modelling population estimates for the Kurdistan Region of Iraq, reveals its practical utility. The model produced accurate predictions while maintaining low relative error rates compared to traditional least squares methods. Future research may explore additional refinements and broader applications to further enhance its effectiveness in complex optimization challenges.
Andrei, N. (2008). Another hybrid conjugate gradient algorithm for unconstrained optimization. Numerical Algorithms, 47(2), 143–156. https://doi.org/10.1007/s11075-007-9152-9
Christensen, R. (1996). Analysis of Variance, Design, and Regression: Applied Statistical Methods. CRC Press, Chapman and Hall.
Dai, Y. H., & Yuan, Y. (1999). A Nonlinear Conjugate Gradient Method with a Strong Global Convergence Property. SIAM Journal on Optimization, 10(1), 177–182. https://doi.org/10.1137/S1052623497318992
Dai, Y. H., & Yuan, Y. (2001). An Efficient Hybrid Conjugate Gradient Method for Unconstrained Optimization. Annals of Operations Research, 103(1), 33–47. https://doi.org/10.1023/A:1012930416777
Fletcher, R. (1987). Practical Methods of Optimization, Unconstrained Optimization (Vol. 1). John Wiley and Sons.
Fletcher, R., & Reeves, C. M. (1964). Function minimization by conjugate gradients. The Computer Journal, 7(2), 149–154. https://doi.org/10.1093/comjnl/7.2.149
Gilbert, J. C., & Nocedal, J. (1992). Global Convergence Properties of Conjugate Gradient Methods for Optimization. SIAM Journal on Optimization, 2(1), 21–42. https://doi.org/10.1137/0802003
Hestenes, M. R., & Stiefel, E. (1952). Methods of conjugate gradients for solving linear systems. Journal of Research of the National Bureau of Standards, 49(6), 409. https://doi.org/10.6028/jres.049.044
Hu, Y. F., & Storey, C. (1991). Global convergence result for conjugate gradient methods. Journal of Optimization Theory and Applications, 71(2), 399–405. https://doi.org/10.1007/BF00939927
Ibrahim, A. L., & Mohammed, M. G. (2022). A new three-term conjugate gradient method for training neural networks with global convergence. Indonesian Journal of Electrical Engineering and Computer Science, 28(1), 551–558. https://doi.org/10.11591/ijeecs.v28.i1.pp551-558
Ibrahim, A. L., & Mohammed, M. G. (2024). A new conjugate gradient for unconstrained optimization problems and its applications in neural networks. Indonesian Journal of Electrical Engineering and Computer Science, 33(1), 93–100. https://doi.org/10.11591/ijeecs.v33.i1.pp93-100
Ibrahim, A. L., & Shareef, S. G. (2019). A new class of three-term conjugate Gradient methods for solving unconstrained minimization problems. General Letters in Mathematics, 7(2). https://doi.org/10.31559/glm2019.7.2.4
Jahwar, B. H., Ibrahim, A. L., Ajeel, S. M., & Shareef, S. G. (2024). Two new classes of conjugate gradient method based on logistic mapping. Telkomnika (Telecommunication Computing Electronics and Control), 22(1), 86–94. https://doi.org/10.12928/TELKOMNIKA.v22i1.25264
Kurdistan Region Statistics Office, Ministry of Planning, K. R. G. (2021). Population Statistics. https://krso.gov.krd/en/statistics/population
Liu, Y., & Storey, C. (1991). Efficient generalized conjugate gradient algorithms, part 1: Theory. Journal of Optimization Theory and Applications, 69(1), 129–137. https://doi.org/10.1007/BF00940464
Lu, H., Zhang, H., & Ma, L. (2006). New optimization algorithm based on chaos. Journal of Zhejiang University - Science A: Applied Physics & Engineering, 7, 539–542. https://doi.org/10.1631/jzus.2006.A0539
Polak, E., & Ribiere, G. (1969). Note sur la convergence de méthodes de directions conjuguées. Revue Française d’informatique et de Recherche Opérationnelle. Série Rouge, 3(R1), 35–43. http://www.numdam.org/item/M2AN_1969__3_1_35_0/
Polyak B. T. (1969). The Conjugate Gradient Method in Extremal Problems. Zh. Vychisl. Mat. Mat. Fiz., 9(4), 807–821.
Shareef, S. G., & Ibrahim, A. L. (2016). A New Conjugate Gradient for Unconstrained Optimization Based on Step Size of Barzilai and Borwein. Science Journal of University of Zakho, 4(1 SE-), 104–114. https://sjuoz.uoz.edu.krd/index.php/sjuoz/article/view/311
Touati-Ahmed, D., & Storey, C. (1990). Efficient hybrid conjugate gradient techniques. Journal of Optimization Theory and Applications, 64(2), 379–397. https://doi.org/10.1007/BF00939455
Vandeginste, B. G. M. (1989). Nonlinear regression analysis: Its applications, D. M. Bates and D. G. Watts, Wiley, New York, 1988. ISBN 0471‐816434. Price: £34.50. Journal of Chemometrics, 3, 544–545. https://api.semanticscholar.org/CorpusID:120419559
Zhang, L., Zhou, W., & Li, D.-H. (2006). A descent modified Polak–Ribière–Polyak conjugate gradient method and its global convergence. IMA Journal of Numerical Analysis, 26(4), 629–640. https://doi.org/10.1093/imanum/drl016
Zoutendijk, G. (1970). Nonlinear Programming Computational Methods. In: Abadie, J. Ed., Integer and Nonlinear Programming, NorthHolllad, Amsterdam, 37-86.