关于log2(x)

写ST表模板时候想到的，要是把预处理所有lg改成31-__builtin_clz应该会快一点？

在Quick C++ Benchmarks试了试，如果是预处理+顺序调用的话（也就是从0到n一个一个调用），预处理快，但是如果说是随机调用顺序，即使无视预处理时间，clz也会更快。

好处就是，简单好写，无额外空间开销，并且支持的范围也更大，对于双精度可以用63-__builtin_clzll(x)，这个应该会慢一点。注意无论哪种x都不能为0，否则是ub。

以下是测试用的代码，func2比func1快1.6倍左右。顺便测试了一下stl内的浮点数log2，那个比这俩慢十几倍。测试链接

CPP

#include <bits/stdc++.h>
using namespace std;

constexpr unsigned low=1,top=1e6-1,siz=1e6;

const auto& nums = [] {
    random_device rd;
    mt19937 gen{rd()};
    array<unsigned, siz> arr{};
    uniform_int_distribution<int>e{low,top};
    for(int i=0;i<arr.size();i++)arr[i]=e(gen);
    return arr;
}();

static void func1(benchmark::State& state) {
    
    for (auto _ : state) {
        int lg[top+1];
        lg[1]=0;
        for(int i=2;i<=top;i++)lg[i]=lg[i>>1]+1;
        for(unsigned i=0;i<=top;i++)
        {
            int res=lg[nums[i]];
            benchmark::DoNotOptimize(res);
        }
    }
}

BENCHMARK(func1);

static void func2(benchmark::State& state) {
    
    for (auto _ : state) {
        for(unsigned i=0;i<=top;i++)
        {
            int res=31-__builtin_clz(nums[i]);
            benchmark::DoNotOptimize(res);
        }
    }
}

BENCHMARK(func2);

讨论操作

回复

相关推荐

关于log2(x)