专栏文章

An easier method of polynomial composition: No more transposition theorem is needed!

算法·理论参与者 63已保存评论 65

文章操作

快速查看文章及其快照的属性,并进行相关操作。

当前评论
65 条
当前快照
2 份
快照标识符
@mi6114so
此快照首次捕获于
2025/11/19 21:17
3 个月前
此快照最后确认于
2025/12/01 21:19
3 个月前
查看原文
Just some random thoughts arised by a simple problem. But it seems that I'm reinventing wheels once more :(
If there's any mistake in the article, please don't hesitate to point it out in the comment.

Introduction

Yesterday night, my friend asked me a question on polynomials (his profile picture is pixelated in order to prevent privacy issues):
(Translation: Ciallo! Is there any solution for [xn]fk(x)[x^n]f^k(x) for all k=1nk=1\sim n in O(n2)\mathcal O(n ^ 2) time complexity that is easy to implement?)
And unfortunately I didn't figure out the solution at first. I even considered it as an unsolvable problem:
(Translation: It's actually unsolvable. I guess, isn't it?)
But I suddenly realized that in the latest algorithm for polynomial composition, we have done a similar process working on fk(x)f^k(x).
So it might be solvable in O(nlog2n)\mathcal O(n\log ^ 2n)?
Why not give it a try?

Solution

We need to get coefficients for every possible kk. Let's introduce another term yy to control the number of ff in a specific product, and rewrite the coefficient as [ykxn]1+yf(x)+y2f2(x)+=[ykxn]11yf(x)[y^kx^n]1+yf(x)+y^2f^2(x)+\dots=[y^kx^n]\frac{1}{1-yf(x)}.
Do you remember how we handled the linear homogeneous recursions using polynomials? We transformed it into an evaluation of a certain coefficient, in the form of a fraction: [xk]P(x)Q(x)[x^k]\frac{P(x)}{Q(x)}. We apply Bostan-Mori algorithm on this fraction afterwards.
For the bivariate polynomial with x,yx, y, we could apply a similar technique. Let's say if we are now finding the coefficient [ykxn]P(x,y)Q(x,y)[y^kx^n]\frac{P(x, y)}{Q(x, y)}. We multiply by Q(x,y)Q(-x, y) on both numerator and denominator to eliminate odd powers of xx on the denominator, resulting in [ykxn]P(x,y)Q(x,y)Q(x2,y)[y^kx^n]\frac{P(x, y)Q(-x, y)}{Q'(x ^ 2, y)}, where Q(x2,y)=Q(x,y)Q(x,y)Q'(x^2, y) = Q(x, y)Q(-x, y).
After this iteration, we can only conserve about half of the elements depending on the parity of nn: if nn is even, we rewrite it as P(x2,y)Q(x2,y)\frac{P'(x^2, y)}{Q'(x^2, y)}, otherwise we rewrite it as xP(x2,y)Q(x2,y)\frac{xP'(x^2, y)}{Q'(x^2, y)}. Now the problem is reduced to a subproblem with same form with the degree of xx halved. When n=0n = 0, the variable xx is entirely eliminated, and we only need to get [yk]P(y)Q(y)[y^k]\frac{P(y)}{Q(y)} using polynomial inversion in O(nlogn)\mathcal O(n\log n) time. (Can be done in O(n)\mathcal O(n), but not necessary.)
Consider the time complexity of the algorithm: after each iteration, although the degree of yy is lifted to twice as before, the degree of xx is halved. Thus, for every polynomial involved in the calculation, the product of degree for x,yx, y is always O(n)\mathcal O(n), so we can multiply two polynomials within O(nlogn)\mathcal O(n\log n) time.
We handled O(logn)\mathcal O(\log n) multiplications, because after this number of iterations we will drop into the corner case of n=0n = 0. Thus, the whole time complexity is O(nlog2n)\mathcal O(n\log ^ 2 n).

Polynomial Composition

Now let us move on to polynomial composition. Suppose we have two polynomials f,gf, g, and a new polynomial g(f(x))=g0+g1f1(x)+g2f2(x)++gnfn(x)g(f(x)) = g_0+g_1f^1(x)+g_2f^2(x)+\dots+g_nf^n(x) is required. (Assume that calculation is done under mod xn+1\bmod \ x^{n + 1}.) The expression seems to have many similarities with the previous question.
We need to assign a coefficient for each single fk(x)f^k(x) depending on g(x)g(x). We can still maintain the variable yy, since we know that the term with yky^k should be multiplied by gkg_k.
Thus, we are actually multiplying the whole expression with g0y0+g1y1+g2y2++gnyng_0y^0+g_1y^{-1}+g_2y^{-2}+\dots+g_ny^{-n} to let every valid term be of y0y^0. With the help of this polynomial, the answer could be written as:
[y0]g(y1)1yf(x)mod xn+1[y^0]\frac{g(y^{-1})}{1-yf(x)}\bmod \ x^{n + 1}
The following step is just the same as we have done in the previous part. Moreover, we need to store every coefficient from x0x^0 to xnx^n, which is slightly different. But an advantage is that the numerator only relates to yy, instead of both x,yx, y.
Suppose we are handling P(y)Q(x,y)\frac{P(y)}{Q(x, y)}. Still multiplying by Q(x,y)Q(-x, y), getting P(y)Q(x,y)Q(x2,y)\frac{P(y)Q(-x, y)}{Q'(x ^ 2, y)}. But the problem is: if we still do casework on the parity of nn, we will get two branches, leading to wrong time complexity as T(m)=2T(m/2)+O(nlogn)T(m)=2T(m/2)+\mathcal O(n\log n).
Can we avoid operations on xx? Putting P(y)Q(x2,y)\frac{P(y)}{Q'(x^2, y)} into the recursion might be a good choice. Now we have the results for P(y)Q(x2,y)\frac{P(y)}{Q'(x^2, y)},and we need to multiply it by Q(x,y)Q(-x, y).
For the first iteration, as we need coefficient of [y0][y ^ 0], we just need to focus on [y0][y^0] and [y1][y ^ {-1}] of P(y)Q(x2,y)\frac{P(y)}{Q'(x^2, y)}, since degy(Q)=1\deg_y(Q) = 1. Now degy(Q)=2\deg_y(Q')=2, and we need [y0],[y1],[y2],[y3][y^0], [y^{-1}],[y^{-2}], [y^{-3}] in the next iteration. So the number of needed values is depending on degy(Q)\deg_y(Q). More specifically, we just need to figure out [ydegy(Q)+1][y0][y^{-\deg_y(Q) + 1}]\sim [y^0] in order to correctly deal with the previous iterations.
As we have said in the previous part, degxdegy=O(n)\deg_x\deg_y = \mathcal O(n), and degy(Q)=k=0lognO(2k)=O(n)\sum \deg_y(Q)=\sum \limits_{k = 0}^{\log n}\mathcal O(2^k) = \mathcal O(n), so the time complexity is still O(nlogn)\mathcal O(n\log n) per iteration, O(nlog2n)\mathcal O(n\log ^ 2 n) in total. When n=0n = 0, an inverse is needed, which can also be done in O(nlogn)\mathcal O(n \log n).
Thus, polynomial composition under modulo xn+1x^{n + 1} is solved in O(nlog2n)\mathcal O(n\log ^2 n) time complexity.

Code

P10249 (n = 2e5) runtime = 11s on luoguCPP
#include <bits/stdc++.h>
using ll = long long;
using ld = long double;
using ull = unsigned long long;
using namespace std;
template <class T>
using Ve = vector<T>;
#define ALL(v) (v).begin(), (v).end()
#define pii pair<ll, ll>
#define rep(i, a, b) for(int i = (a); i <= (b); ++i)
#define per(i, a, b) for(int i = (a); i >= (b); --i)
#define pb push_back
bool Mbe;
ll read() {
    ll x = 0, f = 1; char ch = getchar();
    while(ch < '0' || ch > '9') {if(ch == '-') f = -1; ch = getchar();}
    while(ch >= '0' && ch <= '9') x = x * 10 + ch - '0', ch = getchar();
    return x * f;
}
void write(ll x) {
    if(x < 0) putchar('-'), x = -x;
    if(x > 9) write(x / 10);
    putchar(x % 10 + '0');
}
const ll N = 2e5 + 9;
const ll Mod = 998244353, G = 3, iG = (Mod + 1) / 3;
struct poly {
    Ve<int> a;
    int size() const {return a.size();}
    void resize(int n) {a.resize(n);}
    int operator[] (int n) const {
        assert(0 <= n && n < (int)a.size());
        return a[n];
    } 
    int &operator[] (int n) {
        assert(0 <= n && n < (int)a.size());
        return a[n];
    }
};
int rev[N << 5], tw[N << 5];
int Init(int n) {
    int p = 1, c = 0;
    while(p <= n) p <<= 1, ++c;
    rev[0] = 0;
    rep(i, 1, p - 1) rev[i] = (rev[i >> 1] >> 1) | ((i & 1) << (c - 1));
    return p;
}
int pw(int x, int p) {
    int res = 1;
    while(p) {
        if(p & 1) res = 1ll * res * x % Mod;
        x = 1ll * x * x % Mod, p >>= 1;
    }
    return res;
}
int Add(int x, int y) {
    return ((x += y) >= Mod) ? (x - Mod) : x;
}
int Sub(int x, int y) {
    return ((x -= y) < 0) ? (x + Mod) : x;
}
void NTT(poly &a, int n, int sgn) {
    rep(i, 0, n - 1) {
        if(i < rev[i]) swap(a[i], a[rev[i]]);
    }
    for(int i = 1; i < n; i <<= 1) {
        int wn = pw((sgn == 1 ? G : iG), (Mod - 1) / (i << 1));
        tw[0] = 1;
        rep(j, 1, i - 1) tw[j] = 1ll * tw[j - 1] * wn % Mod;
        for(int j = 0; j < n; j += (i << 1)) {
            rep(k, 0, i - 1) {
                int x = a[j + k], y = 1ll * a[j + k + i] * tw[k] % Mod;
                a[j + k] = Add(x, y), a[j + k + i] = Sub(x, y);
            }
        }
    }
    if(~sgn) return ;
    int iv = pw(n, Mod - 2);
    rep(i, 0, n - 1) a[i] = 1ll * a[i] * iv % Mod;
    return ;
}
poly operator * (const poly &a, const poly &b) {
    ll n = a.size(), m = b.size();
    ll p = Init(n + m - 1);
    poly f = a, g = b; f.resize(p), g.resize(p);
    NTT(f, p, 1), NTT(g, p, 1);
    rep(i, 0, p - 1) f[i] = 1ll * f[i] * g[i] % Mod;
    NTT(f, p, -1), f.resize(n + m - 1);
    return f;
}
poly Inv(poly h, int n){
    if(n == 1){
    poly f;
        f.resize(1), f[0] = pw(h[0], Mod - 2);
        return f;
    }
    int n0 = (n + 1) >> 1;
    poly h0 = h; h0.resize(n0);
    poly f0 = Inv(h0, n0); f0.resize(n);
    int len = Init(n << 1);
    poly _f0 = f0; _f0.resize(len), h.resize(len);
    NTT(_f0, len, 1), NTT(h, len, 1);
    rep(i, 0, len - 1) _f0[i] = 1ll * _f0[i] * _f0[i] % Mod * h[i] % Mod;
    NTT(_f0, len, -1), _f0.resize(n);
    rep(i, 0, n - 1) f0[i] = (f0[i] * 2ll - _f0[i] + Mod) % Mod;
    return f0;
}
Ve<poly> work(const poly &P, const Ve<poly> &Q, int n) {
    int maxy = (int)Q.size() - 1;
    if(!n) {
        poly Q0; Q0.resize(maxy + 1);
        rep(i, 0, maxy) Q0[i] = Q[i][0];
        Q0 = Inv(Q0, maxy + 1);
        int lenP = P.size(); poly R = P;
        reverse(ALL(R.a));
        R = R * Q0;
        poly res; res.resize(maxy);
        rep(i, 0, (int)R.size() - 1) {
            int cy = i - (lenP - 1);
            if(cy <= 0 && cy >= -maxy + 1) res[-cy] = R[i];
        }
        Ve<poly> ret; ret.resize(maxy);
        rep(i, 0, maxy - 1) ret[i].resize(1), ret[i][0] = res[i];
        return ret;
    }
    int n2 = (n << 1) | 1;
    int len = Init(n2 * max(maxy + 1, maxy * 2) + n2 * (maxy + 1));
    poly pos, neg; pos.resize(len), neg.resize(len);
    rep(i, 0, maxy) {
        rep(j, 0, (int)Q[i].size() - 1) {
            int pwr = i * n2 + j;
            pos[pwr] = Q[i][j];
            if(j & 1) neg[pwr] = (Mod - Q[i][j]);
            else neg[pwr] = Q[i][j];
        }
    }
    NTT(neg, len, 1);
    NTT(pos, len, 1);
    rep(i, 0, len - 1) pos[i] = 1ll * pos[i] * neg[i] % Mod;
    NTT(pos, len, -1);
    Ve<poly> Q0; Q0.resize(maxy * 2 + 1);
    rep(i, 0, maxy * 2) Q0[i].resize((n >> 1) + 1);
    rep(i, 0, (int)pos.size() - 1) {
        int cy = i / n2, cx = i - cy * n2;
        if(cx & 1) continue;
        if(cx > n || cy > maxy * 2) continue;
        Q0[cy][cx >> 1] = pos[i];
    }
    Ve<poly> ret0 = work(P, Q0, n >> 1);
    int len0 = ret0.size();
    reverse(ALL(ret0));
    Init(len - 1);
    pos.a.clear(), pos.resize(len);
    rep(i, 0, len0 - 1) {
        rep(j, 0, (int)ret0[i].size() - 1) {
            int pwr = i * n2 + j * 2;
            pos[pwr] = ret0[i][j];
        }
    }
    NTT(pos, len, 1);
    rep(i, 0, len - 1) pos[i] = 1ll * pos[i] * neg[i] % Mod;
    NTT(pos, len, -1);
    Ve<poly> ret; ret.resize(maxy);
    rep(i, 0, maxy - 1) ret[i].resize(n + 1);
    rep(i, 0, (int)pos.size() - 1) {
        int cy = i / n2, cx = i - cy * n2;
        cy -= (len0 - 1);
        if(cy <= 0 && cy >= -maxy + 1) {
            if(cx <= n) ret[-cy][cx] = pos[i];
        }
    }
    return ret;
}
int n, m;
poly f, g;
bool Med;
int main() {
    cerr << fabs(&Med - &Mbe) / 1048576.0 << "MB\n";
    n = read(), m = read();
    f.resize(n + 1), g.resize(m + 1);
    rep(i, 0, n) f[i] = read();
    rep(i, 0, m) g[i] = Mod - read();
    poly con; con.resize(1), con[0] = 1;
    Ve<poly> h = work(f, {con, g}, n);
    assert(h.size() == 1);
    rep(i, 0, n) write(h[0][i]), putchar(' ');
    putchar('\n');
    cerr << "\n" << clock() * 1.0 / CLOCKS_PER_SEC * 1000 << "ms\n";
    return 0;
}

References:

  1. Yasunori Kinoshita, Baitian Li: Power Series Composition in Near-Linear Time.

评论

65 条评论,欢迎与作者交流。

正在加载评论...