THE ASTRINGENCY OF THE GP ALGORITHM FOR FORECASTING SOFTWARE FAILURE DATA SERIES

The forecasting of software failure data series by Genetic Programming (GP) can be realized without any assumptions before modeling. This discovery has transformed traditional statistical modeling methods as well as improved consistency for model applicability. The individuals’ different characteristics during the evolution of generations, which are randomly changeable, are treated as Markov random processes. This paper also proposes that a GP algorithm with “optimal individuals reserved strategy” is the best solution to this problem, and therefore the adaptive individuals finally will be evolved. This will allow practical applications in software reliability modeling analysis and forecasting for failure behaviors. Moreover it can verify the feasibility and availability of the GP algorithm, which is applied to software failure data series forecasting on a theoretical basis. The results show that the GP algorithm is the best solution for software failure behaviors in a variety of disciplines.


INTRODUCTION
Because of the increasingly broad application and importance of software, the quality that people request in software is becoming higher and higher.The appraisal and prediction of software's reliability, as a significant characteristic for weighing software qualities, has been an emphasis that people focus on and study actively.
Models for predicting software reliability are the kernels of this research.
During the process of software reliability research, we have already created many different reliability models.
When applying them for reliability prediction, we always face many problems, such as which model should we select or if the prediction result is credible.As the capabilities for models are difficult to identify, operators seldom being familiar with every model, tend to select the models they need blindly.Meanwhile, there are numerous inconsistencies in software reliability prediction.For example, different models will get different prediction results for the same software system.The prediction quality of the same model for predicting different data may make a large difference.The same model for different phases of the same software system may yield very different predicting qualities.When using one or several models to predict, they will all have low prediction qualities.A discussion of these problems can be found in Cong, Lu, and Bai (2000) and Wang and Jin (2002).
In order to solve the problems above, this paper makes use of the GP algorithm for forecasting software failure Data Science Journal, Volume 6, Supplement, 17 May 2007 data.It then analyzes and verifies that the GP algorithm with an "optimal individual reserved strategy" can be converged, which indicates it is possible to get the best solution for software failure.This approach can create specific programming aimed at generating specific software failure data and then obtain an approximate solution or the best one.This approach removes subjective assumptions of statistical methods.It can improve the consistency of model application and the analysis of software reliability models, resulting in better forecasting for software failure behaviors.

SUMMARY OF THE GP ALGORITHM
Genetic Programming is a technique based on biological evolution, which is developed from the Genetic Algorithm (GA).Here, the depth of the tree is defined as not more than N and is a given positive integer.i T are the individual trees, and each tree's root nodes F r k ∈ ; their leaf nodes are defined as , in which F is the function set and T is the terminal set.We also have to define the fitness function R S f → : , which can search for the best individuals * i T in the space S : The approach of creating trees randomly by a growth method, whose depth cannot exceed the maximum depth D , is defined as follows: The root nodes are selected from the function sets in order to generate non-ordinary individuals, while the other nodes are selected from F T U if the depth of nodes to be selected is less than D , and if the depth equals D , they are selected from T ( Lin, Li, & Kou, 2000).We also adopt the "best-reserved strategy" to make the best individuals reserved in the next generation, so that these individuals do not attend the genetic operations, which are shown in detail in section 4.1 .

FORECASTING FOR SOFTWARE'S NEXT FAILURE TIME BY THE GP ALGORITHM
As we know, traditional software reliability models are all based on different statistical assumptions that constrain both the modes of software failure behaviors for specific models and the suitable range of each model.Fleet Computer Programming Center.These models can get optimal results matching with the corresponding dataset that better fit and forecast.Our tests show that GP can create automatic programming of the specified error data without any assumptions being added during modeling.The feasibility and efficiency of using the GP algorithm for the evolutionary modeling of the Software Failure Data Series are to some extent further indicated.

Affect of optimal reserve strategy on astringency
To illustrate the necessity of the Optimal Reserve Strategy, first the structure is analyzed.If the colonies in each generation created during GP algorithm are treated as one state, a random process taking the place of the entire evolutionary process can be considered and analyzed as a Markov chain for astringency.Further the concept of a "realized history" is above all possible.
Data Science Journal, Volume 6, Supplement, 17 May 2007 Consider the case of a strategy , is satisfied, that is to say, the probability of event t h is positive under the measure of strategyπ , then t h is called the realized history of strategy π .In other words, on the condition of strategy π , a state 0 i is diverted to state 1 i after the experienced behavior 0 a .The procedure keeps going until state t i is reached at the instant of if the experienced behavior 1 a is adopted.If the probability of the whole event inducted by strategy π is not zero, the history is called "realized history" under the strategy π .
The optimal activity set is given as follows.For each state S i ∈ , the following equation is defined as the practicable optimal activities of the state i .

Theorem of Optimal Strategy:
The necessary and sufficient condition of strategy is a realized history of strategy π , then the equation ( ) − ∈ (Liu, 2004).The proof of this theorem can be found in Dong and Liu (1986).The significance lies in the following: a strategy is optimal only when each decision rule has to make use of the optimal behaviors of every realized history.
From the theorem above, GP reserves the optimal individual of every generation for the next generation, which can be expressed as: ( ) , where best A is the best individual of the 1 + t generation and ( ) t A is the optimal one of the t generation.Therefore the GP joined with optimal strategy is still a homogeneous Markov chain.In other words, the probability of going from any state to a state that includes the optimal solution is greater than zero, but it is zero on the contrary.Therefore GP with optimal strategy has the ability of a non-holonomic ergodic process and always can be convergent to the optimal solution with a probability of 1.

Analysis of GP constringency by Markov Chain
The concept of homogeneous time is given first.As we know, the visualized significance of the transition probability ( )  (Zhao & Zhu, 1993).
As standard GA, a Markov chain of the standard GP algorithm is time homogeneous, which can be expressed as: , where , are states while m is the initial time.The initial distribution of population can be random because the initial distribution has no effect on the limited behavior of a Markov chain.

S313
The probability of GP converging to optima is less than 1.This can be shown as follows: The probable states of a colony can be divided into two types, one is 0 I , which includes optimal individuals and the other is n I , which does not include optimal individuals.The result can be satisfied.The stable probability of 1 P transferred to state 0 I is less than 1, which can be proved by contradiction.Assume that the probability for which GP astringent evolves to optima is 1, that is to say, the probability of evolving to state n I is zero, and can be satisfied.During the evolution process of GP, the state of the colony transferred to I j ∈ from I i ∈ by duplication, crossover, and mutation is described by the transition probability of genetic operators , respectively.Then the random matrices can be structured separately as , and the transferred matrix of the colony states GP is , are all random matrices, and H is the Hamming distance between i and j ), the inequality 0 > ij r can be easily proven.In other words, the matrix R is positive definite.At time t , the probability of a colony under state j is ( ) According to the property of a homogeneous Markov chain, the stable probability distribution is independent of the initial probability distribution, or ( ) ( ) 0 , which is inconsistent with the assumption above and the theorem comes into existence.
Therefore, the fact that standard GP can be astringent on the condition that the optimal reserve is adapted has been proved.Otherwise, astringency is not certain.

Summary
Assume that the evolutional colony of the k generation is during the implementation of GP, where n is the size of colony, and is the maximum fitness of the individuals in current generation.When GP is used in practice, the optimal strategy should be used, which selects optimal individuals in the initial generation and reserves them in the colony as individuals for 1 + n , which means they never participate in the evolutional operation from initial generation to the next.
Similarly, for the inherited generation k , the optimal individual selected out from generation k can be compared with optimal individuals from the previous generation.Then the better one can be added into the colony as the optimal individual of the current generation, which can be defined as k 1 n x + for the 1 + n individual.However, it never participates in any evolutional operation.Then we will get the equality ( ) when the optimal reserved strategy is used.
As D is a constant independent of k and 1 ≥ ∀k is always correct, GP is not convergent at this time.In all, the astringency of GP is impossibly promised if the mutation operator in any form is not applied.
Therefore, no matter how large the population, it is finite.Sampling errors of the genetic operation are inevitable, which may make certain elements in F and T needed by individuals in * X disappear from the colony after a number of stages.Even these elements may never have a chance to participate in the colony without mutation.
Therefore they have no possibility of getting an optimal solution.In a word, GP is not likely to be astringent unless certain mutation forms or the optimal reserved strategy is utilized (Jia, Kang, & Chen, 2003).

CONCLUSION
Forecasting of software failure data by Genetic Programming has removed some of the subjective assumptions of statistical models and adds consistency in the application of the models.This makes sense in a practical application for the analysis of software reliability models and forecasting software failure behaviors.This paper treats different states during individuals' evolution in GP as Markov random processes and shows that it will converge to the best solution if the "best-individuals reserved strategy" is used, which can consequently evolve to better individuals.It is proved that the GP algorithm is able to obtain a better solution and may probably be feasible and available for practical applications.However, the influence of generation size as well as the setup for genetic operations on the constringent speed (or the speed that best solutions achieve) may be reduced, and the time for computing may be too long, which involves the efficiency of problem solving.All of these issues should be studied further.In other words, the elements related to the constringent speed should be improved to be fitter for modeling and forecasting time series problems accordingly.

6
Figures 1 and 2 give the transformation curve (model simulation output) for models and the true data series.

Figure1.
Figure1.Simulation result for SYS1 Figure 2. Simulation result for SYS2 From the figures above, we can conclude that the GP model has better applicability (accuracy) in comparison with other models, as well as a better goodness of fit.Furthermore, SYS3 provided by the Musa record set has also been created by automatic programming as well as the test example (from one to sixteen data series) provided by the Armed Forces Engineering Institute and the error statistic data provided by the Naval Tactical Data System's (NTDS) development and testing procedure (from one to thirty data series) of the U.S. Navy's is the probability of transferring a state from i to j , considering time from m to n .As , in other words, no matter when state i in the system starts, the probability is identical when transferred to state j after n steps.And now the Markov Data ScienceJournal, Volume 6, Supplement, 17 May 2007   S314Without the optimal reserved strategy, consider the nontrivial case if f is not a constant value.Assume that The result is that m exists and m >0 when S is finite.It can be obtained