A NEW CONJUGATE GRADIENT METHOD BASED ON LOGISTIC MAPPING FOR UNCONSTRAINED OPTIMIZATION AND ITS APPLICATION IN REGRESSION ANALYSIS

 

Sarwar Ahmad Hamad 1, , Dlovan Haji Omar 1, Diman Abdulqader Sulaiman 1, Alaa Luqman Ibrahim 1,*

 

1 Department of Mathematics, College of Science, University of Zakho, Zakho, Kurdistan Region, Iraq

Corresponding Author Email: alaa.ibrahim@uoz.edu.krd

 

Received: 11 May. 2024 / Accepted: 28 Jul., 2024 / Published: 14 Nov., 2024.               https://doi.org/10.25271/sjuoz.2024.12.3.1310

ABSTRACT:

The study tackles the critical need for efficient optimization techniques in unconstrained optimization problems, where conventional techniques often suffer from slow and inefficient convergence. There is still a need for algorithms that strike a balance between computational efficiency and robustness, despite advancements in gradient-based techniques. This work introduces a novel conjugate gradient algorithm based on the logistic mapping formula. As part of the methodology, descent conditions are established, and the suggested algorithm's global convergence properties are thoroughly examined. Comprehensive numerical experiments are used for empirical validation, and the new algorithm is compared to the Polak-Ribière-Polyak (PRP) algorithm. The suggested approach performs better than the PR algorithm, according to the results, and is more efficient since it needs fewer function evaluations and iterations to reach convergence. Furthermore, the usefulness of the suggested approach is demonstrated by its actual use in regression analysis, notably in the modelling of population estimates for the Kurdistan Region of Iraq. In contrast to conventional least squares techniques, the method maintains low relative error rates while producing accurate predictions. All things considered, this study presents the novel conjugate gradient algorithm as an effective tool for handling challenging optimisation problems in both theoretical and real-world contexts.

KEYWORDS: optimization; conjugate gradient; step size; regression analysis.


1.     Introduction


        Unconstrained optimization issues entail minimizing an objective function that relies exclusively on the real variables, devoid of any restrictions on the variables' values. This can be expressed mathematically as:

                                                         (1.1)

where  , , and   is a gradient at point . The conjugate gradient method (CG) is an optimization algorithm that lies between the steepest descent method and the Newton method in terms of computational complexity and convergence behaviour. Unlike the steepest descent method, which updates the solution solely in the direction of the negative gradient, the (CG) method modifies this direction by incorporating a positive linear combination of the previous search direction. This adjustment helps to overcome the steepest descent method's limitation of slow convergence. The (CG) method only requires the computation of first-order derivatives, specifically the gradient of the objective function, which significantly reduces computational cost compared to methods that require second-order derivatives, such as the Newton method. The Newton method involves calculating and inverting the Hessian matrix, a process that is computationally expensive and often impractical for large-scale problems. In contrast, a key advantage of the (CG) method is that it does not require the Hessian matrix or its approximation, making it particularly well-suited for large-scale optimization challenges.

The (CG) algorithm typically generates a sequence  as:

,                                         (1.2)

where   is  the step length.  is a search direction given by:

  and

,                                               (1.3) 

the parameter ​ is crucial, with different choices leading to various CG methods. Over the years, numerous variants of this scheme have been proposed and widely applied in practice, such as the Fletcher-Reeves (FR), (Fletcher & Reeves, 1964), Polak-Ribière-Polyak (PRP), (Polak & Ribiere, 1969; Polyak B. T., 1969), Hestenes-Stiefel (HS), (Hestenes & Stiefel, 1952), Liu-Storey (LS), (Liu & Storey, 1991), Dai-Yuan (DY), (Dai & Yuan, 1999), and Conjugate-Descent (CD),  (Fletcher, 1987)methods.

,

Fletcher-Reeves (1964).

,

Polak-Ribière-Polyak (1969).

,

Hestenes-Stiefel (1952).

Lia-Storey (1991).

,

Dia-Yuan (1999).

,

Conjugate Descent (1987).

        Numerous researchers have dedicated their efforts to refining (CG) methods, motivated by their widespread adoption in solving optimization problems, as well as their properties of global convergence and low memory utilization. Interestingly, most of these efforts are directed towards improving conventional CG methods, which are the first generation of CG algorithms. For more details, see (Ibrahim & Mohammed, 2022, 2024; Ibrahim & Shareef, 2019; Jahwar et al., 2024; Shareef & Ibrahim, 2016).

Since classical CG methods have well-established convergence properties and fundamental principles, they serve as the basis for all subsequent variants. In spite of their success, these methods have stimulated a wealth of research studies focused on particular issues, like enhancing robustness, scalability, and convergence rates for high-dimensional problems. Since the gradients in these approaches are mutually orthogonal and the parameters  are similar, they are equivalent when  is a highly convex quadratic function and the line search is exact. However, their behaviour differs significantly when applied to broad nonlinear functions with imprecise line searches. Although the strong convergent features of FR, DY, and CD algorithms are well known, jamming may cause them to perform poorly in real-world scenarios. Furthermore, PRP, HS, and LS approaches often outperform one another even if they might not converge in general.

        Naturally, researchers strive to develop new techniques that combine the best features of these two categories. Numerous hybrid approaches have been proposed thus far. For instance, Touati-Ahmed and Storey (Touati-Ahmed & Storey, 1990) originally introduced a hybrid conjugate gradient algorithm that combines the FR and PRP techniques in 1990. Subsequently, Hu and Storey (Hu & Storey, 1991), along with Gilbert and Nocedal (Gilbert & Nocedal, 1992), explored other hybrid systems related to the PRP and FR techniques. Dai and Yuan (Dai & Yuan, 2001) combined the DY technique with the HS method to create two hybrid CG algorithm aimed at enhancing the practical application of the DY method. For large-scale problems in unconstrained optimization, Andrei (Andrei, 2008) presented a novel hybrid CG approach, referred to as the HYBRID algorithm, which is based on the HS and DY methods. A primary characteristic of this hybrid method is that the search direction is the Newton direction. Remarkably, this hybrid technique often out performs certain complex conjugate gradient methods in various applications.

        This research focuses on how new improvements in conjugate gradient methods can contribute to enhanced performance of optimization algorithms. Specifically, how can techniques such as logistic mapping enhance these methods.

The objective of this study is to develop a new conjugate gradient method incorporating logistic mapping and to analyze its performance compared to traditional methods. The focus is on improving convergence rates and robustness, particularly in applications related to regression analysis.

The motivation behind this study is to address the need for more efficient optimization techniques capable of handling large-scale problems and real-world scenarios effectively.

The challenges include ensuring the robustness of the new method, addressing convergence issues, and validating its effectiveness in practical applications.

This study highlights how logistic mapping can improve conjugate gradient methods, offering new insights into optimization by enhancing convergence rates and expanding practical applications.

        The structure of the paper is as follows: In next section, we present our particular technique and several methods we used to determine the parameter . Under appropriate conditions, the descent and adequate descent properties of the suggested approach are also covered with the global convergence. In Section 3, preliminary numerical data are shown. We create a summary of our article in the end.

2.      New Conjugate Gradient Methods

        This section presents a new (CG) method for solving (1.1). The CG parameter of the algorithm is based on the logistic mapping formula (Lu et al., 2006), which is widely used in optimization. By utilizing the logistic mapping along with the CG parameter from Polak-Ribière-Polyak (PRP), the algorithm's performance can be enhanced., From logistic mapping formula we have

,                                                          (2.1)

to achieve balance, multiplying the second term of the equation (2.1), by scalar we get:

,                                              (2.2)

where  and 

After some algebraic operations, we get:

.                                                 (2.3)

1.1     Algorithm of the New (CG) Method:

Step (1): Initialization

               Begin with an initial point .

Step (2): Initialization and Gradient Computation

               Set  and compute the gradient       

                Define the search direction

                If  , terminate the algorithm.

Step (3): Line Search

                Determine the step length  , to minimize the  

                objective function  by using cubic line

                 search

Step (5): Update

                Update the iterate:                                    

Step (6): Gradient Update and Termination Check

                Compute  If: ,

                then stop

Step (7): Conjugacy Update

                Calculate   based on a specific rule (2.3)

Step (8): Conjugate Direction Update

                Update the search direction:

                       ,

Step (9): Convergence Check

               If  or if  return to

               Step 2; otherwise, increment k and return to Step 3

 

Theorem 1. Let  and  , be two sequences generated by new method. Then the  satisfies the:

 

Proof: Using (1.3) and (2.3), we've obtained

,                             (2.4)

by multiplying the aforementioned equation by  , on both sides, we obtain:

  ,                                                                                                                    (2.5)

when the  is determined through an exact line search, yielding , the resultant equation

 .

the descent condition validated.

 In the case of an inexact line search, where . Since we have . So, we get

 ,          (2.6)

        The affirmation that the initial two terms of equation (2.6) adhere to non-positivity stems from the fulfilment of the descent condition by the Polak-Ribière-Polyak (PRP) algorithm. This condition is a fundamental criterion in optimization theory, ensuring that iterative algorithms make progress towards minimizing the objective function.

Since the search direction of (PRP) method represented the descent condition as 

.,

        It signifies that the chosen descent directions align with the objective of function minimization, thereby facilitating convergence. This condition serves as a crucial safeguard against divergent behaviour and ensures the convergence trajectory of the algorithm.

The adherence of the PRP algorithm to the descent condition engenders confidence in subsequent analytical derivations, particularly in equation (2.6). By establishing the negativity of the initial terms, the proof establishes a robust foundation for affirming the convergence of the CG method, even in scenarios involving inexact line search methodologies.

And it is obviously that    andare positive, so, we get to the third term of equation (2.6), which is less than or equal to zero. Hence, we get

 .

The descent condition is proved.

Theorem 2. Let  and   be two sequences generated by new method. Then the  satisfies the sufficient descent condition:

 , for any .

Proof: The two initial terms of equation (2.6) are demonstrably less than or equal to zero due to the properties of the Polak-Ribière-Polyak (PRP) algorithm, which achieve the descent condition. Therefore, we obtain:

,

let    .

Then, .

In this section, we establish the global convergence of the new method by relying on the following basic assumptions regarding the objective function.

Assumption (*): (Zoutendijk, 1970),

1.     Lower Bound: The objective function  is bounded below on   . The level set  is bounded.

2.     Continuity and Differentiability: The objective function  is continuously differentiable on .

3.     Gradient Bound: In some neighbourhood  of  is continuously differentiable, and its gradient is Lipschitz continuous with Lipschitz constant , i.e.

  .

From the above assumptions, that there exists a positive constant such that

                                                                                  (2.7)

If  is a uniformly convex,   such that:

,                            (2.8)

we can rewrite the above equation in the following manner:

,                                                                                        (2.9)

        These assumptions are standard in optimization theory and provide a foundation for demonstrating the convergence properties of the proposed method. Specifically, the continuity and differentiability assumption ensure the smoothness of the objective function, while the Lipschitz condition on the gradient guarantees that changes in the gradient are controlled. Finally, the lower bound condition ensures that the optimization process does not diverge to −∞, thus supporting the argument for global convergence.

Lemma 1. (Zhang et al., 2006). Assuming the aforementioned conditions hold. Consider the methods (1.2) and (1.3), where  is a descent direction and  satisfies the standard Wolfe line search. If

 

then,                                                                                                                     

Theorem 3. If the sequences , ,   , are created by our algorithm, the assumptions (*) hold, the following properties can be established:

Proof: From equations (1.3) and (2.3), we have

 ,             (2.10)

,                                                                                                

since,

 ,                                                                  (2.11)

from Lipschitz Condition and by using (2.9), we get

,           

,                                                                                                                                                          

Since,,   .

 Hence,

 ,

 ,

  ,

By using lemma 1, we get .The proof complete.

3.     Numerical results

        This section presents the numerical results of the new method for unconstrained optimization and its application to regression analysis.

 

3.1 The Unconstrained Optimization

        The objective of this subsection is to assess the efficacy of our novel method in addressing optimization challenges, juxtaposed against the Polak-Ribière-Polyak (PRP), method. We employ a comparative analysis, utilizing well-established nonlinear problems characterized by varying dimensionalities (where ()). Each computational implementation is meticulously crafted in FORTRAN 95 to ensure precision and reliability. Central to our evaluation is the utilization of the cubic interpolation method within the line search procedure. This method leverages both function and gradient variables to navigate the optimization landscape effectively. Key performance metrics, namely the (NOI) and the (NOF), are meticulously documented and explicitly presented in the results table (Table 1). The experimental results, as delineated in Table 2, underscore the superiority of our proposed technique over the PRP method. This superiority is evident in terms of both NOF and NOI, reaffirming the efficacy and efficiency of our novel approach in comparison to the established PRP method. This comprehensive evaluation framework not only provides empirical validation of our method's effectiveness but also underscores its potential to advance the state-of-the-art in optimization methodologies.

Table 1: The results of the new method compared with the Polak-Ribière-Polyak (PRP) method.

Test Function

P

NOI

NOF

NOI

NOF

Wolfe

5

10

100

500

1000

5000

14

32

49

58

64

99

29

65

99

117

129

214

12

25

42

48

48

83

20

50

80

90

90

161

Mile

5

10

100

500

1000

5000

37

37

44

44

50

50

116

116

148

148

180

180

29

29

29

29

50

50

89

89

89

89

180

180

Central

5

10

100

500

1000

5000

22

22

22

23

23

30

159

159

159

171

171

270

22

22

22

23

23

22

159

159

159

170

170

159

Powell

5

10

100

500

1000

5000

40

40

43

46

46

50

120

120

135

150

150

180

30

32

32

32

38

38

80

90

90

90

97

97

Sum

5

10

100

500

1000

5000

6

6

14

21

23

31

39

34

80

123

127

145

6

6

11

17

17

25

39

34

59

86

86

120

Wood

5

10

100

500

1000

5000

29

29

30

30

30

30

67

67

69

69

69

69

29

29

30

30

30

30

67

67

69

69

69

69

Cubic

4

10

100

500

1000

5000

15

16

16

16

16

16

45

47

47

47

47

47

14

14

14

14

15

15

39

39

39

39

43

43

Rosen

4

10

100

500

1000

5000

30

30

30

30

30

30

85

85

85

85

85

85

28

28

28

28

28

28

65

65

65

65

65

65

Total

1539

5233

1324

4193

 

 

Table 2: The improvement percentage of the new method compared with the Polak-Ribière-Polyak (PRP)

method.

 

Tools

PRP

New

NOI

100%

86.02989 %

NOF

100%

80.12612268 %

 

 

3.2 Application of the New Method to Regression Analysis

        Regression analysis is a crucial statistical method frequently utilized in areas such as accounting, economics, management, physics, finance, and beyond (Christensen, 1996; Vandeginste, 1989). It is employed to examine the relationship between independent and dependent variables within different datasets. The purpose of regression analysis can be outlined as follows:

 

where  is the predictor,  is the response variable, and  is the error. The linear regression function is derived such that

 

    This method is typically applied when the relationship between  and  can be represented by a straight line, although such cases are rare. As a result, nonlinear regression models are often employed. This paper focuses on the nonlinear regression approach.

This subsection provides a detailed overview of population estimates for the Kurdistan Region of Iraq (KRI) from 1965 to 2020. The statistics in Table 3 are sourced from the data collected by the Kurdistan Region Statistics Office, Ministry of Planning, Kurdistan Regional Government (Kurdistan Region Statistics Office, Ministry of Planning, 2021). In this analysis, the years of data collection are represented by the -variable, while the population figures for KRI serve as the -variable. Data from 1965 to 2014 will be used for fitting the model, while the data from 2020 will be reserved for error analysis.

Table 3: Population estimates of KRI.

Years

KRI Population

1965

902,000

1987

2,015,466

1997

2,861,701

2014

5,332,600

2020

6,171,083

 

The approximate function for the nonlinear least squares method is derived using the data in above Table as follows:

.                  (3.1)

The function (3.1) is use to approximate the value of  based on value of  from 2014 - 2017. Let  denotes number of years and  be the recorded cases of drug addicts. Then, the above least squares method (3.1) is transformed into the following unconstrained minimization problems:

,                 (3.2)

Data from 1965 to 2014 are used to develop the nonlinear quadratic model using the least squares method, along with the associated test function for the unconstrained optimization problem. It is evident from this analysis that the  and the value of  ​ values exhibit a parabolic relationship, as described by the regression function defined in (3.2) with the regression parameters  and .

However, the data for 2020 is excluded from the unconstrained optimization function to facilitate the calculation of relative errors for the predicted data. Consequently, the proposed method is used to solve the test function (3.2) utilizing the strong Wolfe line search technique, with the results presented in Table 4.

Table 4: Test results for optimization of quadratic model using new method.

Initial Points

No. for Iteration

CPU Time

(1,1,1)

13

0.015291

(2,2,2)

13

0.028205

(10,10,10)

13

0.015354

(15,15,15)

15

0.023973

 

Table 5 displays the relative error of the new method compared to the least squares method. A smaller relative error value indicates greater accuracy and a better fit to the observed dataset.

 

Table 5: Estimation point and relative errors for 2020 data.

Models

Point

Relative Error

New

6195047.5029

0.0039

Least Square

6375777.5441

0.0332

 

Conclusion   

        This work introduces a novel (CG) method that based on the logistic mapping formula to enhance optimization problem-solving. The proposed method satisfies both descent and sufficient descent conditions, with the latter being a stronger criterion that significantly improves numerical performance. A comprehensive analysis of the global convergence properties confirms the method's effectiveness, establishing it as a robust approach in the optimization filed. Numerical experiments demonstrate that the new method outperforms the traditional Polak-Ribière-Polyak CG method, particularly in terms of Numbers of Iterations (NOI) and Numbers of Function evaluations (NOF). The results indicate that the proposed algorithm achieves superior convergence rates and requires fewer computational resources, making it especially valuable for large-scale problems with dimensions ranging from 10 to 5000. Furthermore, the application of the new method to regression analysis, specifically in modelling population estimates for the Kurdistan Region of Iraq, reveals its practical utility. The model produced accurate predictions while maintaining low relative error rates compared to traditional least squares methods. Future research may explore additional refinements and broader applications to further enhance its effectiveness in complex optimization challenges.

REFERENCES

Andrei, N. (2008). Another hybrid conjugate gradient algorithm for unconstrained optimization. Numerical Algorithms, 47(2), 143–156. https://doi.org/10.1007/s11075-007-9152-9

Christensen, R. (1996). Analysis of Variance, Design, and Regression: Applied Statistical Methods. CRC Press, Chapman and Hall.

Dai, Y. H., & Yuan, Y. (1999). A Nonlinear Conjugate Gradient Method with a Strong Global Convergence Property. SIAM Journal on Optimization, 10(1), 177–182. https://doi.org/10.1137/S1052623497318992

Dai, Y. H., & Yuan, Y. (2001). An Efficient Hybrid Conjugate Gradient Method for Unconstrained Optimization. Annals of Operations Research, 103(1), 33–47. https://doi.org/10.1023/A:1012930416777

Fletcher, R. (1987). Practical Methods of Optimization, Unconstrained Optimization (Vol. 1). John Wiley and Sons.

Fletcher, R., & Reeves, C. M. (1964). Function minimization by conjugate gradients. The Computer Journal, 7(2), 149–154. https://doi.org/10.1093/comjnl/7.2.149

Gilbert, J. C., & Nocedal, J. (1992). Global Convergence Properties of Conjugate Gradient Methods for Optimization. SIAM Journal on Optimization, 2(1), 21–42. https://doi.org/10.1137/0802003

Hestenes, M. R., & Stiefel, E. (1952). Methods of conjugate gradients for solving linear systems. Journal of Research of the National Bureau of Standards, 49(6), 409. https://doi.org/10.6028/jres.049.044

Hu, Y. F., & Storey, C. (1991). Global convergence result for conjugate gradient methods. Journal of Optimization Theory and Applications, 71(2), 399–405. https://doi.org/10.1007/BF00939927

Ibrahim, A. L., & Mohammed, M. G. (2022). A new three-term conjugate gradient method for training neural networks with global convergence. Indonesian Journal of Electrical Engineering and Computer Science, 28(1), 551–558. https://doi.org/10.11591/ijeecs.v28.i1.pp551-558

Ibrahim, A. L., & Mohammed, M. G. (2024). A new conjugate gradient for unconstrained optimization problems and its applications in neural networks. Indonesian Journal of Electrical Engineering and Computer Science, 33(1), 93–100. https://doi.org/10.11591/ijeecs.v33.i1.pp93-100

Ibrahim, A. L., & Shareef, S. G. (2019). A new class of three-term conjugate Gradient methods for solving unconstrained minimization problems. General Letters in Mathematics, 7(2). https://doi.org/10.31559/glm2019.7.2.4

Jahwar, B. H., Ibrahim, A. L., Ajeel, S. M., & Shareef, S. G. (2024). Two new classes of conjugate gradient method based on logistic mapping. Telkomnika (Telecommunication Computing Electronics and Control), 22(1), 86–94. https://doi.org/10.12928/TELKOMNIKA.v22i1.25264

Kurdistan Region Statistics Office, Ministry of Planning, K. R. G. (2021). Population Statistics. https://krso.gov.krd/en/statistics/population

Liu, Y., & Storey, C. (1991). Efficient generalized conjugate gradient algorithms, part 1: Theory. Journal of Optimization Theory and Applications, 69(1), 129–137. https://doi.org/10.1007/BF00940464

Lu, H., Zhang, H., & Ma, L. (2006). New optimization algorithm based on chaos. Journal of Zhejiang University - Science A: Applied Physics & Engineering, 7, 539–542. https://doi.org/10.1631/jzus.2006.A0539

Polak, E., & Ribiere, G. (1969). Note sur la convergence de méthodes de directions conjuguées. Revue Française d’informatique et de Recherche Opérationnelle. Série Rouge, 3(R1), 35–43. http://www.numdam.org/item/M2AN_1969__3_1_35_0/

Polyak B. T. (1969). The Conjugate Gradient Method in Extremal Problems. Zh. Vychisl. Mat. Mat. Fiz., 9(4), 807–821.

Shareef, S. G., & Ibrahim, A. L. (2016). A New Conjugate Gradient for Unconstrained Optimization Based on Step Size of Barzilai and Borwein. Science Journal of University of Zakho, 4(1 SE-), 104–114. https://sjuoz.uoz.edu.krd/index.php/sjuoz/article/view/311

Touati-Ahmed, D., & Storey, C. (1990). Efficient hybrid conjugate gradient techniques. Journal of Optimization Theory and Applications, 64(2), 379–397. https://doi.org/10.1007/BF00939455

Vandeginste, B. G. M. (1989). Nonlinear regression analysis: Its applications, D. M. Bates and D. G. Watts, Wiley, New York, 1988. ISBN 0471‐816434. Price: £34.50. Journal of Chemometrics, 3, 544–545. https://api.semanticscholar.org/CorpusID:120419559

Zhang, L., Zhou, W., & Li, D.-H. (2006). A descent modified Polak–Ribière–Polyak conjugate gradient method and its global convergence. IMA Journal of Numerical Analysis, 26(4), 629–640. https://doi.org/10.1093/imanum/drl016

Zoutendijk, G. (1970). Nonlinear Programming Computational Methods. In: Abadie, J. Ed., Integer and Nonlinear Programming, NorthHolllad, Amsterdam, 37-86.