[时间复杂度 Time Complexity of Algorithms

神犇说世上的美有三种，一是优美精致的数据结构，二是巧夺天工的神奇算法，三是你温暖世界的笑容

计算机之所以叫做计算机，就是因为它在“计算”这一件事上做的可以比人类快。而我们的任务，就是告诉计算机要如何去进行一个计算，也就是要实现一个算法。
想要让程序更快，除了要提升计算机处理器的运行速度之外，最重要的就是要提升算法的效率，以及算法的在这个计算机上实现的的速度。
做同样的一件事情，可以有无数种做法。比如说给一些数字排序，就有好几种做法。那么要怎么比较他们的速度呢？
最直观的做法自然是写出对应的程序，给定一个输入，然后比较一下他们的运行时间。这样的做法简单粗暴，然而并不能给我们关于这些算法的更多信息，也只能反映这些算法在这个特定输入的情况，误差也不容小视。
更为严谨的做法，是对这个算法进行数学上的分析，以得出这个算法在理论上的运行效率。
以选择排序为栗子把
（然后把栗子剥开吃掉~

void selection_sort(int arr[], int len) 
{
    //代码摘录自[选择排序-维基百科](https://zh.wikipedia.org/wiki/%E9%80%89%E6%8B%A9%E6%8E%92%E5%BA%8F)
    //Copyright:CC_BY-SA3.0
    int i, j, min, temp;
    for (i = 0; i < len - 1; i++) {
        //arr[i]也就是未排序区的第一个作为初始值
        min = i;
        for (j = i + 1; j < len; j++){
            //通过比较查找最小值
            if (arr[min] > arr[j]){
                min = j;
            }
        }
        //选择最小值并交换到arr[i]，作为已排序区的最后一个
        swap(min,i);
    }
}

void selection_sort(int arr[], int len)

{

//代码摘录自[选择排序-维基百科](https://zh.wikipedia.org/wiki/%E9%80%89%E6%8B%A9%E6%8E%92%E5%BA%8F)

//Copyright:CC_BY-SA3.0

int i, j, min, temp;

for (i = 0; i < len - 1; i++) {

//arr[i]也就是未排序区的第一个作为初始值

min = i;

for (j = i + 1; j < len; j++){

//通过比较查找最小值

if (arr[min] > arr[j]){

min = j;

}

//选择最小值并交换到arr[i]，作为已排序区的最后一个

swap(min,i);

}

来看看他的运行原理

第零轮从所有的n个数据中找出最小的，需要对每个\(j\in \left ( 0,n \right )\)进行一次比较，共n-1次
第一轮从剩下的n-1个数据中找出最小的，需要对\(j\in \left ( 1,n \right )\)每个进行一次比较，共n-2次
……
第i轮从剩下的n-i个数据中找出最小的，需要对\(j\in \left ( i,n \right )\)每个进行一次比较，共n-i-1次
第n-2轮从剩下的2个数据中找出最小的，需要进行一次比较

总共需要进行 \(\sum_{k = 1}^{n-1} k=\frac{n\left ( n-1 \right )}{2}\)次比较
然后每轮需要对数据进行一次交换，合计是\(\left ( n-1 \right )\)次交换
也就是说，这个程序所需要运行的时间大致上是
\(T=\frac{n\left ( n-1 \right )}{2}T_{0}+\left ( n-1 \right )T_{1}\)，其中\(T_{0}\)为一次比较需要的时间，\(T_{1}\)为一次交换所需要的时间
当n很大的时候，对T的量级起到决定性作用的将会是什么呢？
（还记得函数在无穷大的极限吗？
n趋向于无穷大时，起作用的将会是n的二次项，也就是\(\frac{n^{2}}{2}T_{0}\)
那么可以看出，选择排序的算法的运行时间在n充分大的时候，大致正比于\(n^{2}\)
继续阅读→

[Dr.Lib]Note:Algorithm – 莫队算法

莫队算法：对于两个区间的查询[l1,r1] ,[l2,r2]如果每增加一个区间元素或者删除，都能做到O(1)的话

那么从[l1,r1]转移到[l2,r2]，暴力可以做到|l1-l2|+|r1-r2|，就是manhattan距离

莫队的论文一直没有找到，所以只是大致的了解下，应该是证明出构造出哈密尔顿路径是最优的。

但是可以用manhattan mst来构造，大约会是两倍，然后莫队证明出这样转移的上限是O(n*sqrt(n))。

所以对于这种无修改的区间查询来说

可以先将所有的区间，看成二维平面上的点，求一次manhattan mst，然后根据mst来进行转移

相邻的两个区间的查询转移，暴力解决。

Via http://blog.csdn.net/huzecong/article/details/8576908

http://blog.csdn.net/acm_cxlove/article/details/8894431

在此感谢

据说这个算法是莫涛提出的（Orz！），但是在网上到处都搜不到相关资料，最后问pty才知道的。这个算法是用于处理一类不带修改的区间查询问题的离线算法，其核心在于利用曼哈顿距离最小生成树算法对区间处理顺序进行处理。比如下面这个例题（清橙A1206《小Z的袜子》，就是莫队出的题）：

给定一个长为N的序列，每个元素的值是其颜色。有M次询问，每次询问从一个区间中随机选取两个元素同色的概率。

一次询问[l,r]的答案即，其中是区间中第i中颜色的个数。显然暴力是O(NM)的，而且一般的区间问题的思路似乎不适用。

我们先考虑一个简化的问题：所有的查询区间的左端点都是1。那么我们可以按右端点排序，假设已经处理出了[1,r]的答案，考虑转移到[1,r+k]，即添加k个元素，这个可以在O(k)的复杂度内求出。那么处理所有区间的复杂度（不考虑排序）就是O(N)。

那么如果是从[l,r]转移到[l’,r’]呢？复杂度即O(|r’-r|+|l’-l|)，也即点(l,r)到点(l’,r’)的曼哈顿距离。那么如果将所有询问转化成二维平面中的点，求曼哈顿距离最小生成树，再按照生成树的顺序做，就可以最小化区间之间转移的复杂度。可以证明（我不会证……似乎莫队的论文里有），这样做的复杂度是O(N^1.5)的。问题也就得到了解决。

[Dr.Lib]Note:Algorithm – Kruskal

#include<iostream>
#include<cstdio>
#include<algorithm>
#include<vector>
#include<cmath>
#include<cstring>
#include<queue>

#define PB(x) push_back(x)
#define PS(x) push(x)
#define MP(x,y) make_pair(x,y)

using namespace std;
typedef long long LL;
const int MAXN=100;
const int INF=2147483647;
const int NINF=-1<<30;
const int ERR=-1;

struct E{int s,t,w;};
bool cmp(E a,E b){return a.w>b.w;}
vector<E> Q;
vector<E> ans;
int f[MAXN];
int n,m,tot=0;
int getb(int x){return f[x]==x?x:f[x]=getb(f[x]);}
int main(){
    getint(n);getint(m);
    for(int i=m;i>0;--i){E tm;getint(tm.s);getint(tm.t);getint(tm.w);Q.PB(tm);f[i]=i;}
    sort(Q.begin(),Q.end(),cmp);
    int a=0;
    while(ans.size()<n-1){
        a++;
        int ps=Q[a].s,pt=Q[a].t;
        if(!(getb(ps)==getb(pt))){f[getb(ps)]=f[getb(pt)];ans.PB(Q[a]);tot+=Q[a].w;}
    }
    //Do YOLO
	return 0;
}

#include<iostream>

#include<cstdio>

#include<algorithm>

#include<vector>

#include<cmath>

#include<cstring>

#include<queue>

#define PB(x) push_back(x)

#define PS(x) push(x)

#define MP(x,y) make_pair(x,y)

using namespace std;

typedef long long LL;

const int MAXN=100;

const int INF=2147483647;

const int NINF=-1<<30;

const int ERR=-1;

struct E{int s,t,w;};

bool cmp(E a,E b){return a.w>b.w;}

vector<E> Q;

vector<E> ans;

int f[MAXN];

int n,m,tot=0;

int getb(int x){return f[x]==x?x:f[x]=getb(f[x]);}

int main(){

getint(n);getint(m);

for(int i=m;i>0;--i){E tm;getint(tm.s);getint(tm.t);getint(tm.w);Q.PB(tm);f[i]=i;}

sort(Q.begin(),Q.end(),cmp);

int a=0;

while(ans.size()<n-1){

a++;

int ps=Q[a].s,pt=Q[a].t;

if(!(getb(ps)==getb(pt))){f[getb(ps)]=f[getb(pt)];ans.PB(Q[a]);tot+=Q[a].w;}

}

//Do YOLO

return 0;

}

[Dr.Lib]Note:Math – Extend Baby Step Giant Step

这种数学题都是转AC大神的……以及Seter

http://hi.baidu.com/aekdycoin/item/236937318413c680c2cf29d4

http://seter.is-programmer.com/posts/32188.html

Baby Step Giant Step

\[A^{x}=B (\mod C) 0\leq x <C,C\in \mathbb{P}\]

求解x,我们可以做一个等价

\[x = i\ast m + j ( 0 \leq i,j< m) ,m = \lceil \sqrt{C} \rceil\]

而这么分解的目的无非是为了转化为:

\[(A^{i})^{m} * A^{j} = B \mod C\]

之后做少许暴力的工作就可以解决问题：
(1) for i = 0 -> m, 插入\( Hash (A^i \mod C, i) \)

(2) 枚举 i ,对于每一个枚举到的i,令 \( {A}' = (A^m)^i \mod C\)
我们有\[A' * A^j = B \mod C\]

显然\(A'B,C\),均已知，而由于C为素数,那么(AA,C)无条件为1

令\(p=(A^M)^i \mod C，x=A^j \mod C\)则\(p*x=B\mod C\)。

我们从0开始枚举i直到i*M超过C-1，则相当于知道了p，要求x。

由于 (A',C)=1 ，于是对于这个模方程解的个数唯一(可以利用扩展欧几里得或欧拉定理来求解)

那么对于得到的唯一解x,在Hash表中寻找(x,j)，如果找到，则返回 \(i*m+j>0\) 。

如果需要得到 x > 0的解，那么只需要在上面的步骤中判断当 \(i*m+j>0\) 的时候才返回。

扩展Baby Step Giant Step

\[A^{x}=B (\mod C) 0\leq x <C,C\in \mathbb{N^*}\]

当C不是素数的时候能否直接套用呢？当然不可以……最直接的问题就是，不一定存在逆元！

考虑一个\(G'\)同时是ABC的因数，令\(B'=\frac{B}{G'},C'=\frac{C}{G'}\)，则当x不等于0时，

\(\frac{A}{G'}*A^(x-1)=B'\mod C'\)。

这样多弄几次，C的因数就越来越少，直到(A,C)=1。

那么如何选取\(G'\)呢？AC大神告诉你：不断取\(G'=gcd(A,C')\)，直到\(G'\)=1。如果任意一个\(G'\)不是\(B'\)的因数则一定无解。

假设取了r次\(G'\)，然后所有\(G'\)的积是G，则问题变为\(\frac{A^r}{G}*A^{x-r}=B'\mod C'\)，令\(D=\frac{A^r}{G}\)（一定是整数），则由于此时\((A,C')=1\)，所以\((D,C')=1\)。

那么上面算法中只要变成求\(D\)对于\(C'\)的逆元就可以了，返回的答案还要加上r。

还有一点小问题，就是这样得出的答案是大于等于r的。但是即使每次\(G'=2\)，r最大也只有\(log_{2}C\)，那么这些再暴力求解就可以了。

[Dr.Lib]Note:Algorithm – Manhattan minimum spanning tree

Via http://community.topcoder.com/tc?module=Static&d1=tutorials&d2=lineSweep

We first break this down into a simpler problem. Standard MST algorithms for general graphs (e.g., Prim's algorithm) can compute the MST in O((E + N) log N) time for E edges. If we can exploit geometric properties to reduce the number of edges to O(N), then this is merely O(N log N). In fact we can consider, for each point P, only its nearest neighbors in each of the 8 octants of the plane (see the figure below). The figure shows the situation in just one of the octants, the West-Northwest one. Q is the closest neighbour (with the dashed line indicating points at the same Manhattan distance as Q), and R is some other point in the octant. If PR is an edge in a spanning tree, then it can be removed and replaced by either PQ or QR to produce a better spanning tree, because the shape of the octant guarantees that |QR| ≤ |PR|. Thus, we do not need to consider PR when building the spanning tree.

This reduces the problem to that of finding the nearest neighbour in each octant. We'll just consider the octant shown; the others are no different and can be handled by symmetry. It should be clear that within this octant, finding the nearest neighbour is equivalent to just finding the point with the largest value of x − y, subject to an upper bound on x + y and a lower bound on y, and this is the form in which we'll consider the problem.

Now imagine for the moment that the lower bound on y did not exist. In this case we could solve the problem for every P quite easily: sweep through the points in increasing order of x + y, and Q will be the point with the largest x − y value of those seen so far. This is where the divide-and-conquer principle comes into play: we partition the point set into two halves with a horizontal line, and recursively solve the problem for each half. For points P in the upper half, nothing further needs to be done, because points in the bottom half cannot play Q to their P. For the bottom half, we have to consider that by ignoring the upper half so far we may have missed some closer points. However, we can take these points into account in a similar manner as before: walk through all the points in x + y order, keeping track of the best point in the top half (largest x − y value), and for each point in the bottom half, checking whether this best top-half point is better than the current neighbour.

So far I have blithely assumed that any set of points can be efficiently partitioned on Y and also walked in x + y order without saying how this should be done. In fact, one of the most beautiful aspects of this class of divide-and-conquer plus line-sweep algorithms is that it has essentially the same structure as a merge sort, to the point that a merge-sort by x + y can be folded into the algorithm in such a way that each subset is sorted on x + y just when this is needed (the points initially all being sorted on Y). This gives the algorithm a running time of O(N log N).

The idea of finding the closest point within an angle range can also be used to solve the Euclidean MST problem, but the O(N log N) running time is no longer guaranteed in the worst cases, because the distance is no longer a linear equation. It is actually possible to compute the Euclidean MST in O(N log N) time, because it is a subset of the Delaunay triangulation.

Tag Archives: algorithm