Maximum diversity problem

作者:
Juan Francisco Gómez González and Miguel Ángel Beltrán Sánchez
最近上传:
7 年前
许可:
Creative Commons CC BY 4.0
摘要:
We observe three differents algorithms, GRASP, Tabu Search best and Tabu Search first, for the maximum diversity problem and we get the best of those.
标签:
Reports
\begin{now}
了解为什么全世界有超过20万人信任 Overleaf 并把工作交给它。
免费注册探索所有订阅计划
\documentclass[11pt, english]{article}
\usepackage{graphicx}
\usepackage[colorlinks=true, linkcolor=blue]{hyperref}
\usepackage[english]{babel}
\selectlanguage{english}
\usepackage[utf8]{inputenc}
\usepackage[svgnames]{xcolor}



\usepackage{listings}
\usepackage{afterpage}
\pagestyle{plain}

\definecolor{dkgreen}{rgb}{0,0.6,0}
\definecolor{gray}{rgb}{0.5,0.5,0.5}
\definecolor{mauve}{rgb}{0.58,0,0.82}

%\lstset{language=R,
%    basicstyle=\small\ttfamily,
%   stringstyle=\color{DarkGreen},
%    otherkeywords={0,1,2,3,4,5,6,7,8,9},
%    morekeywords={TRUE,FALSE},
%    deletekeywords={data,frame,length,as,character},
%    keywordstyle=\color{blue},
%    commentstyle=\color{DarkGreen},
%}

\lstset{frame=tb,
language=R,
aboveskip=3mm,
belowskip=3mm,
showstringspaces=false,
columns=flexible,
numbers=none,
keywordstyle=\color{blue},
numberstyle=\tiny\color{gray},
commentstyle=\color{dkgreen},
stringstyle=\color{mauve},
breaklines=true,
breakatwhitespace=true,
tabsize=3
}

\usepackage{here}


\textheight=21cm
\textwidth=17cm
%\topmargin=-1cm
\oddsidemargin=0cm
\parindent=0mm
\pagestyle{plain}

%%%%%%%%%%%%%%%%%%%%%%%%%%
% La siguiente instrucciÃ³n pone el curso automÃ¡ticamente%
%%%%%%%%%%%%%%%%%%%%%%%%%%

\usepackage{color}
\usepackage{ragged2e}

\global\let\date\relax
\newcounter{unomenos}
\setcounter{unomenos}{\number\year}
\addtocounter{unomenos}{-1}
\stepcounter{unomenos}
\gdef\@date{ Course \arabic{unomenos}/ 2019}

\begin{document}

\begin{titlepage}

\begin{center}
\vspace*{-1in}
\begin{figure}[htb]
\begin{center}
\includegraphics[width=8cm]{logo}
\end{center}
\end{figure}

FACULTY OF MATHEMATICS - \@date\\
\vspace*{0.15in}
STATISTICS AND OPERATIONAL RESEARCH DEPARTMENT \\
\vspace*{0.4in}
\begin{large}
HEURISTIC ALGORITHMS:\\
\end{large}
\vspace*{0.2in}
\begin{Large}
\textbf{Max-Diversity} \\
\end{Large}
\vspace*{0.3in}
\begin{large}
Rafael Martí Cunquero \\
\end{large}
\vspace*{0.3in}
\rule{80mm}{0.1mm}\\
\vspace*{0.1in}
\begin{large}
Made by: \\
Gómez González, Juan Francisco \\
Beltrán Sánchez, Miguel Ángel \\
\end{large}
\includegraphics[width=10cm]{LogoFac.jpg}
\end{center}
\end{titlepage}

\newcommand{\CC}{C\nolinebreak\hspace{-.05em}\raisebox{.4ex}{\tiny\bf +}\nolinebreak\hspace{-.10em}\raisebox{.4ex}{\tiny\bf +}}
\def\CC{{C\nolinebreak[4]\hspace{-.05em}\raisebox{.4ex}{\tiny\bf ++}}}

\tableofcontents
\newpage
\section{Introduction and important things}
This paper is for the subject Models in operation research. This paper consists of the study of two types of algorithms, one of them is GRASP(Greedy Randomized Adaptive Search Procedure), did it in one class with our teacher Rafa, and the other follows the methodology of \textbf{\textit{tabu search}}, did it by ourselves. Our study consist of solve the maximum diversity problem (MDP). The problem of maximum diversity consists of selecting a certain number of elements from among all the available ones in order to obtain the greatest diversity. If we draw in a plane the elements represented by points, we look for those whose sum of distances between them is greater, hence the term of greatest diversity. \\

To do our paper, we need two parts; first of all we need to calibrate the parameters of our algorithms and secondly, we will do the comparative between both algorithms. Both parts will be done in a collection of 8 problems with size 500. \\

It is noted that the programming language that we have used in the implementation of both algorithms has been Dev-\CC \ 5.11. Obviously we have run the algorithms, whom we are going to speak later, in the same computer which has the following characteristics: Intel(R) Core(TM) i3-3217U CPU 1.8GHz. Then we have had the results with the limitations of our computer, but we always guaranty that the results are fairly taken in both algorithms with the same computation time. The C code has been attached by e-mail. We have chosen this way for two reasons: The reader will be more interested in the selection of the parameters and in the comparative between methods, and we don't want to do unnecessary papers, that makes the reading more fluid. \\

Previous the study of the parameters, we will comment a few things about how works our algorithms. GRASP tries to get better solutions by changing the element of the solution (local search) and our initial solution is, as its name told us, a greedy randomize. The start solution of the other algorithm is a greedy one and later we use Tabu Search which consists on get one element of the solution and changes it to a better one, if we can't get a better one then we change it to the best of the worst,i.e., we get the element with the higher value. The objective of worsen is to get a new space of solutions and try to improve our solution in this new space.

\section{Calibration of the parameters}
Both algorithms have parameters. In the first method we have the parameter $\alpha$ which defines the list of candidates in the neighborhood of the solution. $\alpha$ with the randomize to get our different solutions. We have $\alpha \in [0,1]$, if we put $\alpha=0$ we get a randomize algoritm and with $\alpha=1$ we get a random element and then we use the greedy construction to get the others elements of the solution. We are looking for the best $\alpha$ for the maximum diversity problem. \\

In the same way, in Tabu Search we have the \textit{tenure} as a parameter. This parameter tells us with element can be selected again for adding it in our solution, this is why it's very important to determinate the value of this parameter. First of all we are going to determinate the parameter $\alpha$ of the GRASP and secondly the \textit{tenure} of Tabu Search. We implement two methods, first and best in the Tabu Search so, we need to determinate the tenure for both algorithms. The algorithm Tabu first does the same as Tabu seach, we try to improve the solution by change elements of the solution, if we find some element we change it and we doesn't mine if it only improve 1 or 2 points in the objective function (we will see that in some problems this algorithm is not the best), the Tabu best consist of explore all the possibles neighborhood and we don't improve our solution until we don't explore all the elements and when we see all the elements we improve with the best of them (if we can't improve we get the best of the worse like we have said), as you can see this algorithm has more computational time than the other one. \\

We let two minutes for each example, for each algorithm and for each $\alpha$, \textit{tenure} that we have tried. In our statistical analysis we are aware that we only have eight examples and we know our limitations.


\subsection{GRASP and its parameter $\alpha$}
The values of the objective function has been taken in the following table:

\begin{table}[h]
  \begin{center}
    \label{tab:table1}
    \begin{tabular}{|c|c|c|c|c|c|c|c|c|} 
    \hline
      $\alpha$ & Amparo & Borja & Daniel & Emilio & Jose & Maria Jesús & Raquel & Virginia\\
      \hline
     0.5  & 21461 & 21453 & 21412 & 21684 & 21503 & 21513 & 21676 & 21426\\ \hline
     0.6 & 21553 & 21481 & 21406 & 21684 & 21487 & 21677 & 21685 & 21750\\ \hline
     0.7 & 21793 & 21660 & 21697 & 21684 & 21577 & 21783 & 21818 & 21518\\ \hline
     0.8 & 21783 & 21816 & 21735 & 21942 & 21952 & 21885 & 21884 & 21730\\ \hline
     0.9 & 22080 & 22118 & 21895 & 22045 & 21951 & 22035 & 22008 & 21910\\
\hline
    \end{tabular}
    \caption{2 minutes for each example to get our $\alpha$}
  \end{center}
\end{table}

In our statistical analysis, the response variable and the factor of study are the following:

\textbf{Response variable (r.v.):} value of the objective function in pur algorithm GRASP. \\
\textbf{Factor:} $\alpha$

We answer the question: There is any significant difference between the mean of the r.v. for our chosen $\alpha$?
We will have done the statistical analysis with \textit{software} R. Our statistical study will divide in two parts:
1) Graphic representation and numeric data and 2) hypothesis contrasting. \\

\subsubsection{Graphic representation and numeric data}

We are going to see in the next code in R (the graphic representation will be done by a box plot). \\

NUMERIC REPRESENTATION
\begin{lstlisting}
Amp=c(21461,21553,21793,21783,22080)
Bor=c(21453,21481,21660,21816,22118)
Dan=c(21412,21406,21697,21735,21895)
Emi=c(21684,21684,21684,21942,22045)
Jos=c(21503,21487,21577,21952,21951)
Mar=c(21513,21677,21783,21885,22035)
Raq=c(21676,21685,21818,21884,22008)
Vir=c(21426,21750,21518,21730,21910)
datos=cbind(Amp,Bor,Dan,Emi,Jos,Mar,Raq,Vir)
rownames(datos)=c(".5",".6",".7",".8",".9")
datos<-t(datos)

#mean
medias<-apply(datos,2,mean) #mean for each alpha

#summary
repnum=apply(datos,2,summary) #NUMERIC REPRESENTATION
varianza<-apply(datos,2,var)
std=varianza^(1/2) #standard deviation for each alpha
\end{lstlisting}
We get the next outputs:

\begin{lstlisting}
> means
      .5       .6       .7       .8       .9 
21516.00 21590.38 21691.25 21840.88 22005.25

> repnum
              .5       .6       .7       .8       .9
Min.    21412.00 21406.00 21518.00 21730.00 21895.00
1st Qu. 21446.25 21485.50 21639.25 21771.00 21940.75
Median  21482.00 21615.00 21690.50 21850.00 22021.50
Mean    21516.00 21590.38 21691.25 21840.88 22005.25
3rd Qu. 21553.75 21684.25 21785.50 21899.25 22053.75
Max.    21684.00 21750.00 21818.00 21952.00 22118.00

> std
       .5        .6        .7        .8        .9 
106.84568 124.63884 106.21239  87.71128  80.12802 
\end{lstlisting}

Graphic representation
\begin{lstlisting}
#Box plot
par(mfrow=c(1,5))
boxplot(datos[,1],xlab="alpha=.5", ylab="f.o.",ylim=c(21400,22200))
boxplot(datos[,2],xlab="alpha=.6", ylab="f.o.",ylim=c(21400,22200))
boxplot(datos[,3],xlab="alpha=.7", ylab="f.o.",ylim=c(21400,22200))
boxplot(datos[,4],xlab="alpha=.8", ylab="f.o.",ylim=c(21400,22200))
boxplot(datos[,5],xlab="alpha=.9", ylab="f.o.",ylim=c(21400,22200))
\end{lstlisting}
We have got: \\

\includegraphics[width=15cm]{Rplot.eps}
\\ in which we can see that $\alpha=0.9$
We can see, and our data tell that, $\alpha=0.9$ is the best of all.It is also remarkable that the biggest interquartile range is in  $\alpha=0.6$, that means that the data has more diversity between them. \\

\subsubsection{Hypothesis contrasting}
Now we will have done the hypothesis contrasting, which our null hypothesis $H_0$ is $\mu_{0.5}=\mu_{0.6}=\mu_{0.7}=\mu_{0.8}=\mu_{0.9}$, and our alternative hypothesis $H_A$ is that there are differences between the means. With our ANOVA's knowledge, we can apply it because our data don't have the conditions for the applicability, for example, we don't have independence between variables because we share examples. We should use a non parametric contrasting like the Test of Friedman o Kruskal-Wallis. We are in front of a multiple contrasting with the same means. We are going to see it in the next script of R. 

\begin{lstlisting}
friedman.test(data) #test no parametrico

Friedman rank sum test

data:  datos
Friedman chi-squared = 26.051, df = 4, p-value = 3.09e-05
\end{lstlisting}

p-value = $3.09e-05 < 0.5$ we have enough evidence with $95\%$  that we can't assume that the means are the same.  Moreover, the test finds enough evidences between at least in four groups.\\

Joint it with the graphic description, numeric description and the test, we have determined that our best $\alpha$ is $0.9$. We wanted to use (and we tried) a comparation post-hoc for the test of Friedman, we tried the test of range with sing of Wilcoxon pairwise.wilcox.test( paried = TRUE ) or Tukey’s range test: in R with the function fiendmanmc() the packege pgirmess, but we need the knowledge about non parametric methods, because we haven't studied it in the degree so, we don't know how to use it.

\subsection{Tabu Search (\textit{First}) and its parameter \textit{tenure}.}
We have studied forty \textit{tenure} (from 1 to 40) for every Tabu (\textit{First} and \textit{Best}). The output is a big table so we have chosen the best of them to let you see how it works. In Tabu \textit{First} if we choose \textit{tenures} bigger than ten, our solution doesn't improve so, the best solution was the greedy. Then we choose the first ten tenure to show in the paper.

\begin{table}[H]
  \begin{center}
    \label{tab:table1}
    \begin{tabular}{|c|c|c|c|c|c|c|c|c|} 
    \hline
      $\textit{tenure}$ & Amparo & Borja & Daniel & Emilio & Jose & Maria Jesús & Raquel & Virginia\\
      \hline
     1  & 20988 & 21249 & 21193 & 20796 & 21304 & 21324 & 21293 & 21128\\ \hline
     2 &  21267 & 20926 & 21275 & 21310 & 21174 & 21032 & 21071 & 21008\\ \hline
     3 & 21121 & 21136 & 21088 & 21111 & 21081 & 20861 & 21207 & 21023\\ \hline
     4 & 21006 & 20970 & 20958 & 20999 & 21049 & 20918 & 21076 & 20983\\ \hline
     5 & 20829 & 20766 & 20819 & 20806 & 21049 & 20807 & 21139 & 20727\\ \hline
     6 & 20653 & 20814 & 20719 & 20790 & 21049 & 20791 & 21028 & 20689\\ \hline
     7 & 20606 & 20734 & 20610 & 20790 & 21049 & 20791 & 20889 & 20721\\ \hline
     8 & 20583 & 20717 & 20610 & 20790 & 21049 & 20791 & 20734 & 20568\\ \hline
     9 & 20575 & 20717 & 20610 & 20790 & 21049 & 20791 & 20651 & 20545\\ \hline
     10 & 20575 & 20717 & 20610 & 20790 & 21049 & 20791 & 20728 & 20535\\

\hline
    \end{tabular}
    \caption{2 minutes of computation for every example.}
  \end{center}
\end{table}
In our statistical analysis, the response variable and the factor of study are the following.\\

\textbf{Response variable (r.v.):} value of the objective function in our Tabu \textit{First} algorithm. \\
\textbf{Factor:} \textit{tenure}
We answer the question: There is any significant difference between the mean of the r.v. for our chosen $\alpha$?
We will have done the statistical analysis with \textit{software} R. Our statistical study will divide in two parts:
1) Graphic representation and numeric data and 2) hypothesis contrasting. \\

\subsubsection{Graphic and numeric representation}
We are going to see it in the next script of R (the graphic representation will be done by a box plot).

\newpage

Numeric representation.
\begin{lstlisting}
Amp=c(20988, 21267, 21121, 21006, 20829, 20653, 20606, 20583, 20575, 20575)
Bor=c(21249, 20926, 21136, 20970, 20766, 20814, 20734, 20717, 20717, 20717)
Dan=c(21193, 21275, 21088, 20958, 20819, 20719, 20610, 20610, 20610, 20610)
Emi=c(20796, 21310, 21111, 20999, 20806, 20790, 20790, 20790, 20790, 20790)
Jos=c(21304, 21174, 21081, 21049, 21049, 21049, 21049, 21049, 21049, 21049)
Mar=c(21324, 21032, 20861, 20918, 20807, 20791, 20791, 20791, 20791, 20791)
Raq=c(21293, 21071, 21207, 21076, 21139, 21028, 20889, 20734, 20651, 20728)
Vir=c(21128, 21008, 21023, 20983, 20727, 20689, 20721, 20568, 20545, 20535)
datos=cbind(Amp,Bor,Dan,Emi,Jos,Mar,Raq,Vir)
rownames(datos)=c("1","2","3","4","5","6","7","8","9","10")
datos<-t(datos) #traspuesta

#mean
medias<-apply(datos,2,mean) #medias para cada tenure

#summary
repnum=apply(datos,2,summary) 
varianza<-apply(datos,2,var)
std=varianza^(1/2)
\end{lstlisting}

We have got the next outputs:

\begin{lstlisting}
> means
       1        2        3        4        5        6        7        8        9       10 
21159.38 21132.88 21078.50 20994.88 20867.75 20816.62 20773.75 20730.25 20716.00 20724.38 
> repnum
               1        2        3        4        5        6        7        8        9
Min.    20796.00 20926.00 20861.00 20918.00 20727.00 20653.00 20606.00 20568.00 20545.00
1st Qu. 21093.00 21026.00 21066.50 20967.00 20796.00 20711.50 20693.25 20603.25 20601.25
Median  21221.00 21122.50 21099.50 20991.00 20813.00 20790.50 20762.00 20725.50 20684.00
Mean    21159.38 21132.88 21078.50 20994.88 20867.75 20816.62 20773.75 20730.25 20716.00
3rd Qu. 21295.75 21269.00 21124.75 21016.75 20884.00 20867.50 20815.50 20790.25 20790.25
Max.    21324.00 21310.00 21207.00 21076.00 21139.00 21049.00 21049.00 21049.00 21049.00
              10
Min.    20535.00
1st Qu. 20601.25
Median  20722.50
Mean    20724.38
3rd Qu. 20790.25
Max.    21049.00
\end{lstlisting}

\newpage

\begin{lstlisting}

> std
        1         2         3         4         5         6         7         8         9 
184.17068 143.25246 102.23502  50.25773 145.39282 147.66849 145.85879 156.53366 163.25878 
       10 
162.76709 
\end{lstlisting}
We see that the best \textit{tenures} are the smallest ones because it has a bigger mean. 

GRAPHIC REPRESENTATION
\begin{lstlisting}
#Box plot
par(mfrow=c(1,5))
boxplot(datos[,1],xlab="tenure=1", ylab="f.o.",ylim=c(20530,21350))
boxplot(datos[,2],xlab="tenure=2", ylab="f.o.",ylim=c(20530,21350))
boxplot(datos[,3],xlab="tenure=3", ylab="f.o.",ylim=c(20530,21350))
boxplot(datos[,4],xlab="tenure=4", ylab="f.o.",ylim=c(20530,21350))
boxplot(datos[,5],xlab="tenure=5", ylab="f.o.",ylim=c(20530,21350))

par(mfrow=c(1,5))
boxplot(datos[,6],xlab="tenure=6", ylab="f.o.",ylim=c(20530,21350))
boxplot(datos[,7],xlab="tenure=7", ylab="f.o.",ylim=c(20530,21350))
boxplot(datos[,8],xlab="tenure=8", ylab="f.o.",ylim=c(20530,21350))
boxplot(datos[,9],xlab="tenure=9", ylab="f.o.",ylim=c(20530,21350))
boxplot(datos[,10],xlab="tenure=10", ylab="f.o.",ylim=c(20530,21350))
\end{lstlisting}
We have got:

\begin{figure}[h!]
        \begin{minipage}[b]{0.5\linewidth} 
        \centering
        \includegraphics[width=9cm]{Rplot02.eps}
         \label{figura9}
        \end{minipage}
        \hspace{0.5cm} % If we want a space between graphics.
        \begin{minipage}[b]{0.5\linewidth}
        \centering
        \includegraphics[width=9cm]{Rplot03.eps} \label{figura9amp}
        \end{minipage}
\end{figure}
We observe that the biggest \textit{tenure} has worse results.  \\
The \textit{tenure} $1$ and $2$ seems to be the best \textit{tenure} for our tabu, because with them we obtain the best value of our objective function. \\

\subsubsection{Hypothesis contrasting.}
Now we will have done the hypothesis contrasting, which our null hypothesis $H_0$ is $\mu_1=\mu_2=\mu_3=\mu_4=\mu_5=\mu_6=\mu_7=\mu_8=\mu_9=\mu_{10}$, and our alternative hypothesis $H_A$ is that there are differences between the means. With our ANOVA's knowledge, we can apply it because our data don't have the conditions for the applicability, for example, we don't have independence between variables because we share examples. We have to use a non parametric contrasting like the Test of Friedman o Kruskal-Wallis. We are in front of a multiple contrasting with the same means. We are going to see it in the next script of R.

\begin{lstlisting}
> friedman.test(datos) #non parametric test

	Friedman rank sum test

data:  datos
Friedman chi-squared = 61.511, df = 9, p-value = 6.85e-10
\end{lstlisting}
p-value = $6.85e-10 < 0.5$ we have enough evidence with $95\%$  that we can't assume that the means are the same.  Moreover, the test finds enough evidences between at least nine groups.\\
Joint it with the graphic description, numeric description and the test, we have determined that our best \textit{tenure} for the Tabu \textit{First} is $1$.

\subsection{Tabu Search (\textit{Best}) and its parameter \textit{tenure}.}
We have studied forty \textit{tenure} (from 1 to 40) the same as Tabu \textit{First}. The output is a big table so we have chosen the best of them to let the reader see how it works.

\begin{lstlisting}
> means
 [1] 21033.25 21008.62 21063.25 21087.12 21040.12 21172.12 21219.25 21184.12 21198.25
[10] 21205.25 21217.88 21179.62 21215.38 21182.62 21180.75 21147.50 21155.00 21195.00
[19] 21192.88 21155.75 21192.88 21153.50 21215.38 21176.25 21163.62 21186.00 21198.25
[28] 21148.88 21137.50 21210.38 21154.12 21152.00 21160.62 21153.12 21159.75 21189.38
[37] 21102.25 21135.12 21120.88 21136.38
> order(means)
 [1]  2  1  5  3  4 37 39 38 40 29 16 28 32 34 22 31 17 20 35 33 25  6 24 12 15 14  8 26
[29] 36 19 21 18  9 27 10 30 13 23 11  7
\end{lstlisting}
Our better \textit{tenures} were: $7, 9, 10, 11, 13, 18, 21, 23, 27, 30$; then our table will have the best ten \textit{tenure} that we represent it as shown:

\begin{table}[H]
  \begin{center}
    \label{tab:table1}
    \begin{tabular}{|c|c|c|c|c|c|c|c|c|} 
    \hline
      $\textit{tenure}$ & Amparo & Borja & Daniel & Emilio & Jose & Maria Jesús & Raquel & Virginia\\
      \hline
     7 & 21218 & 21367 & 21217 & 21293 & 21107 & 21022 & 21323 & 21207\\ \hline
     9 & 21151 & 21362 & 21224 & 21158 & 21062 & 21211 & 21167 & 21251\\ \hline
     10 & 21121 & 21288 & 21152 & 21239 & 21161 & 21121 & 21370 & 21190\\ \hline
     11 & 21182 & 21266 & 21188 & 21325 & 21170 & 21140 & 21257 & 21215\\ \hline
     13 & 21153 & 21198 & 21156 & 21344 & 21092 & 21209 & 21325 & 21246\\ \hline
     18 & 21194 & 21182 & 21082 & 21274 & 21243 & 21121 & 21363 & 21101\\ \hline
     21 & 21184 & 21223 & 21040 & 21260 & 21164 & 21114 & 21273 & 21285\\ \hline
     23 & 21250 & 21170 & 21320 & 21305 & 21050 & 21123 & 21214 & 21291\\ \hline
     27 & 21322 & 21326 & 21065 & 21185 & 21247 & 21085 & 21185 & 21171\\ \hline
     30 & 21233 & 21108 & 21099 & 21379 & 21331 & 21148 & 21124 & 21261\\

\hline
    \end{tabular}
    \caption{2 minutes for every example.}
  \end{center}
\end{table}
In our statistical analysis, the response variable and the factor of study are the following:

\textbf{Response variable (r.v.):} value of the objective function in our Tabu \textit{Best}. \\
\textbf{Factor:} \textit{tenure}

We answer the question: There is any significant difference between the mean of the r.v. for our chosen $\textit{tenure}$?
We will have done the statistical analysis with \textit{software} R. Our statistical study will divide in two parts:
1) Graphic representation and numeric data and 2) hypothesis contrasting. \\

\subsubsection{Graphic representation and numeric data}
We are going to see it in the next script of R (the graphic representation will be done by a box plot).

NUMERIC REPRESENTATION
\begin{lstlisting}
datosb=cbind(datos[,7],datos[,9],datos[,10],datos[,11],datos[,13],datos[,18],datos[,21],datos[,23],
datos[,27],datos[,30])
rownames(datosb)=c("Amp","Bor","Dan","Emi","Jos","Mar","Raq","Vir")
colnames(datosb)=c("7","9","10","11","13","18","21","23","27","30")

#means
medias<-apply(datos,2,mean) 

#summary
repnum=apply(datos,2,summary) #NUMERIC REPRESENTATION
varianza<-apply(datos,2,var)
std=varianza^(1/2)
\end{lstlisting}

We have got the following tables:

\newpage

\begin{lstlisting}
> means
       7        9       10       11       13       18       21       23       27       30 
21219.25 21198.25 21205.25 21217.88 21215.38 21195.00 21192.88 21215.38 21198.25 21210.38 
> repnum
               7        9       10       11       13       18       21       23       27
Min.    21022.00 21062.00 21121.00 21140.00 21092.00 21082.00 21040.00 21050.00 21065.00
1st Qu. 21182.00 21156.25 21144.25 21179.00 21155.25 21116.00 21151.50 21158.25 21149.50
Median  21217.50 21189.00 21175.50 21201.50 21203.50 21188.00 21203.50 21232.00 21185.00
Mean    21219.25 21198.25 21205.25 21217.88 21215.38 21195.00 21192.88 21215.38 21198.25
3rd Qu. 21300.50 21230.75 21251.25 21259.25 21265.75 21250.75 21263.25 21294.50 21265.75
Max.    21367.00 21362.00 21370.00 21325.00 21344.00 21363.00 21285.00 21320.00 21326.00
              30
Min.    21099.00
1st Qu. 21120.00
Median  21190.50
Mean    21210.38
3rd Qu. 21278.50
Max.    21379.00
> std
        7         9        10        11        13        18        21        23        27 
113.19862  87.68083  88.11640  60.78871  86.58594  95.63323  85.20972  95.49860  96.86920 
       30 
107.10200 
\end{lstlisting}
We can see that \textit{tenure} equal to 7 (Fred Glover said that seven is \textbf{the magic number}) is the one with the biggest mean, and the others are 11, 23, 13, 30 y 10 (all of them hit the score of 21200 in the objective function).

Graphic representation
\begin{lstlisting}
#Box plot
par(mfrow=c(1,5))
boxplot(datosb[,1],xlab="tenure=7", ylab="f.o.",ylim=c(21020,21380))
boxplot(datosb[,2],xlab="tenure=9", ylab="f.o.",ylim=c(21020,21380))
boxplot(datosb[,3],xlab="tenure=10", ylab="f.o.",ylim=c(21020,21380))
boxplot(datosb[,4],xlab="tenure=11", ylab="f.o.",ylim=c(21020,21380))
boxplot(datosb[,5],xlab="tenure=13", ylab="f.o.",ylim=c(21020,21380))

par(mfrow=c(1,5))
boxplot(datosb[,6],xlab="tenure=18", ylab="f.o.",ylim=c(21020,21380)) 
boxplot(datosb[,7],xlab="tenure=21", ylab="f.o.",ylim=c(21020,21380))
boxplot(datosb[,8],xlab="tenure=23", ylab="f.o.",ylim=c(21020,21380))
boxplot(datosb[,9],xlab="tenure=27", ylab="f.o.",ylim=c(21020,21380))
boxplot(datosb[,10],xlab="tenure=30", ylab="f.o.",ylim=c(21020,21380))
\end{lstlisting}
We have obtained:

\begin{figure}[h!]
        \begin{minipage}[b]{0.5\linewidth} 
        \centering
        \includegraphics[width=9cm]{Rplot04.eps}
         \label{figura9}
        \end{minipage}
        \hspace{0.5cm} 
        \begin{minipage}[b]{0.5\linewidth}
        \centering
        \includegraphics[width=9cm]{Rplot05.eps} \label{figura9amp}
        \end{minipage}
\end{figure}
We can observe that the \textit{tenure} equal to 7 has minimum value between all the \textit{tenure}, however it is the one with have the best mean if we join all the examples together. The worst thing with the number 7, as we can see, is the diversity of its solutions. However, the number 10, although doesn't have better mean as 7, it extrems aren't be very striking.

It seems that \textit{tenure} equal to 7 or 10 are one of the best of the tenure that we can find.
\subsubsection{Hypothesis contrasting.}
Now we will have done the hypothesis contrasting, which our null hypothesis $H_0$ is $\mu_7=\mu_9=\mu_{10}=\mu_{11}=\mu_{13}=\mu_{18}=\mu_{21}=\mu_{23}=\mu_{27}=\mu_{30}$, and our alternative hypothesis $H_A$ is that there are differences between the means. With our ANOVA's knowledge, we can apply it because our data don't have the conditions for the applicability, for example, we don't have independence between variables because we share examples. We should use a non parametric contrasting like the Test of Friedman o Kruskal-Wallis. We are in front of a multiple contrasting with the same means. We are going to see it in the next script of R.

\begin{lstlisting}
> friedman.test(datosb) #non parametric test

	Friedman rank sum test

data:  datosb
Friedman chi-squared = 3.091, df = 9, p-value = 0.9606
\end{lstlisting}
p-value = $0.9606 > 0.05$ we don't have enough evidence so we can't reject the null hypothesis that our means are the same. \\
Joinly with graphic and numeric description and with the test, we can determinate that our best value of \textit{tenure} for the Tabu \textit{Best} is $7$.

\section{Comparative between GRASP and TABU.}
In the next section we are going to discus our target from the beggining, see what is the best algorithms, GRAPS or Tabu Search (\textit{First} or \textit{Best}). The data of the different examples are based on the election of the parameters $\alpha$ and two \textit{tenures} which we have calibrated and we laid down in the following table (we need to remember that we got $\alpha=0.9$, Tabu \textit{First} \textit{tenure} equal to 1 and Tabu \textit{Best} \textit{tenure} equal to 7). 
\begin{table}[h]
  \begin{center}
    \label{tab:table1}
    \begin{tabular}{|c|c|c|c|c|c|c|c|c|} 
    \hline
      Algorithm & Amparo & Borja & Daniel & Emilio & Jose & Maria Jesús & Raquel & Virginia\\
      \hline
     GRASP & 22080 & 22118 & 21895 & 22045 & 21951 & 22035 & 22008 & 21910\\ \hline
     Tabu \textit{First} & 20988 & 21249 & 21193 & 20796 & 21304 & 21324 & 21293 & 21128\\ \hline
     Tabu \textit{Best} & 21218 & 21367 & 21217 & 21293 & 21107 & 21022 & 21323 & 21207\\
\hline
    \end{tabular}
    \caption{The 3 algorithms with their better parameters.}
  \end{center}
\end{table}
In our statistical analysis, the response variable and the factor of study are the following: \\





\textbf{Response variable (r.v.):} value of the objective function with our 3 algorithms. \\
\textbf{Factor:} Methods

We answer the question: There is any significant difference between the mean of the r.v. between our 3 algorithms?
We will have done the statistical analysis with \textit{software} R. Our statistical study will divide in two parts(as always):
1) Graphic representation and numeric data and 2) hypothesis contrasting. \\

\subsubsection{Graphic and numeric representation}

We are going to see it in the next script of R (the graphic representation will be done by a box plot). \\

REPRESENTACIÓN NUMÉRICA
\begin{lstlisting}
Amp=c(22080,20988,21218)
Bor=c(22118,21249,21367)
Dan=c(21895,21193,21217)
Emi=c(22045,20796,21293)
Jos=c(21951,21304,21107)
Mar=c(22035,21324,21022)
Raq=c(22008,21293,21323)
Vir=c(21910,21128,21207)
datos=cbind(Amp,Bor,Dan,Emi,Jos,Mar,Raq,Vir)
rownames(datos)=c("Grasp","T. First","T. Best")

datos<-t(datos) #traspuesta


#means
medias<-apply(datos,2,mean) #medias para cada metodo

#summary
repnum=apply(datos,2,summary)
varianza<-apply(datos,2,var)
std=varianza^(1/2)
\end{lstlisting}

We have got the following tables:

\begin{lstlisting}
> medias
   Grasp T. First  T. Best 
22005.25 21159.38 21219.25 
> repnum
           Grasp T. First  T. Best
Min.    21895.00 20796.00 21022.00
1st Qu. 21940.75 21093.00 21182.00
Median  22021.50 21221.00 21217.50
Mean    22005.25 21159.38 21219.25
3rd Qu. 22053.75 21295.75 21300.50
Max.    22118.00 21324.00 21367.00
> std
    Grasp  T. First   T. Best 
 80.12802 184.17068 113.19862 
\end{lstlisting}
As we can see the best algorithm is the GRASP. It is the best with differences, in the conclusions we say why that can happen.\\

Graphics representation.
\begin{lstlisting}
#Box plot
par(mfrow=c(1,3))
boxplot(datos[,1],xlab="Grasp", ylab="f.o.",ylim=c(20790,22120))
boxplot(datos[,2],xlab="T. First", ylab="f.o.",ylim=c(20790,22120))
boxplot(datos[,3],xlab="T. Best", ylab="f.o.",ylim=c(20790,22120))
\end{lstlisting}
\newpage
We have obtained:

\includegraphics[width=15cm]{Rplot06.eps}
\\ Again, we see that the GRASP algorithm is better than any Tabu Search. From the other side, Tabu \textit{Best} has shown better results than Tabu \textit{First}. We are going to compare this two by an hypothesis contrasting.


\subsubsection{Hypothesis contrasting.}
Now, we will have done the hypothesis contrasting, which our null hypothesis $H_0$ is$\mu_{G}=\mu_{TF}=\mu_{TB}$, and our alternative hypothesis $H_A$ is that there are differences between the means, as the previous ones. We are going to use a non parametric contrasting like the Test of Friedman o Kruskal-Wallis. We are in front of a multiple contrasting with the same means. We are going to see it in the next script of R.

\begin{lstlisting}
> friedman.test(datos) 

	Friedman rank sum test

data:  datos
Friedman chi-squared = 13, df = 2, p-value = 0.001503
\end{lstlisting}

p-value = $0.001503 < 0.05$ we have enough evidence with $95\%$  that we can't assume that the means are the same. \\
Jointly with graphic and numeric description and with the test, we can determinate that GRASP has been better than Tabu Search. We are going to prove why in these examples GRASP is better than any Tabu that we ran.

\section{Conclusions and conjecture.}
Our algorithm GRASP,which we have done with our teacher Rafael Martí, has resulted to be the winner. One possibility is that we have a good implementation of the construction of the solution and a good implementation of the local search. \\

However, with the same computational time (two minutes), Tabu Search \textit{First} can't hit the score of 21000 and the GRASP was near to 22000. We thought that this can happen because our algorithm get worse more time that it improve. It can be because Tabu Search \textit{First} works like that. It try to improve, if we get an element which improves the value of the solution, we add it to the new solution without taken on count that we can get better improves if we explore more elements of the neighborhood. \\

Then, we are going to check our conjecture: How many time Tabu \textit{First} get worse and improve in an example with different values of the parameter \textit{tenure}? \\
For this question we have designed a code with shows a counter which will give us the number of times that our algorithm gets worse and our algorithm improves. \\
Later we ran it in many examples and we have seen that our Tabu \textit{First} improve more times than it get worse, then, what is wrong with the algorithm? \\
For answer this questions we designed a code which show us how many times our problem improve, or get worse the objective function with a determinate number, i.e., we are looking for the improves whom increments the objective function in a determinate number (for example, 10). \\
We got what we were looking for, Tabu \textit{First} improve more times but when it improve, it is in a little quantities (this is the problem that we get with Tabu \textit{First} and this is why Tabu \textit{Best} is better in this examples).However, when it get worse, it take bigger quantities, we represent this in the following table (we only let it 30 seconds of computation) \\

We denote M$+i$ as the number of times that the algorithm improve more than $i$ units the objective function and we denote E$+i$ as the number of time that the algorithm get worse more than $i$ units the objective function.

\begin{table}[H]
  \begin{center}
    \label{tab:table1}
    \begin{tabular}{|c|c|c|c|c|c|c|c|c|} 
    \hline
      $\textit{Tenure}$ & Improves & Worsening & M$+20$ & E$+20$ & M$+50$ & E$+50$ & M$+90$ & E$+90$\\
      \hline
     2 & 417130412 & 392410467 & 208503 & 173990 & 170784 & 154659  & 86272 & 119104\\ \hline
     3 & 534696743 & 370918863 & 292731 & 200510 & 152824 & 167668 & 97066 & 106448\\ \hline
     4 & 569957079 & 375361296 & 261709 & 200469 & 158381 & 155252 & 82677 & 103590\\

\hline
    \end{tabular}
    \caption{Improves and Worsening in the example "Amparo"}
  \end{center}
\end{table}

As we had said, there are more improves than worsening for the \textit{tenure} equal to 2, 3 and 4 as table shows us. The improves are small amounts as we can see in the column $E+20$, and the worsenings are bigger. \\
In essence: There are more improve than worsening but the improve doesn't contribute very much in the objective function while the worsening has a big contribution in the objective function (we can see in the column $E+90$ which means that there are 119104 worsening which get the objective function decrease more than 90 pints of value while improve there are 86272). That's why our algorithm doesn't get better solutions. Our conjecture has been solved.


%uoooooooooooooooo tumadreuooooooooooooooooooo UOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO
%AL FIN SE TERMINA ESTA PUTA MIERDA!!!!
%USEGREAS OSTOJEOGIRN ojeogiek


\end{document}
Maximum diversity problem

联系