An easier method of polynomial composition: No more transposition theorem is needed!

Just some random thoughts arised by a simple problem. But it seems that I'm reinventing wheels once more :(

If there's any mistake in the article, please don't hesitate to point it out in the comment.

Introduction

Yesterday night, my friend asked me a question on polynomials (his profile picture is pixelated in order to prevent privacy issues):

(Translation: Ciallo! Is there any solution for

[x^n]f^k(x)

for all

k=1\sim n

\mathcal O(n ^ 2)

time complexity that is easy to implement?)

And unfortunately I didn't figure out the solution at first. I even considered it as an unsolvable problem:

(Translation: It's actually unsolvable. I guess, isn't it?)

But I suddenly realized that in the latest algorithm for polynomial composition, we have done a similar process working on

f^k(x)

So it might be solvable in

\mathcal O(n\log ^ 2n)

Why not give it a try?

Solution

We need to get coefficients for every possible

k

. Let's introduce another term

y

to control the number of

f

in a specific product, and rewrite the coefficient as

[y^kx^n]1+yf(x)+y^2f^2(x)+\dots=[y^kx^n]\frac{1}{1-yf(x)}

Do you remember how we handled the linear homogeneous recursions using polynomials? We transformed it into an evaluation of a certain coefficient, in the form of a fraction:

[x^k]\frac{P(x)}{Q(x)}

. We apply Bostan-Mori algorithm on this fraction afterwards.

For the bivariate polynomial with

x, y

, we could apply a similar technique. Let's say if we are now finding the coefficient

[y^kx^n]\frac{P(x, y)}{Q(x, y)}

. We multiply by

Q(-x, y)

on both numerator and denominator to eliminate odd powers of

x

on the denominator, resulting in

[y^kx^n]\frac{P(x, y)Q(-x, y)}{Q'(x ^ 2, y)}

, where

Q'(x^2, y) = Q(x, y)Q(-x, y)

After this iteration, we can only conserve about half of the elements depending on the parity of

n

: if

n

is even, we rewrite it as

\frac{P'(x^2, y)}{Q'(x^2, y)}

, otherwise we rewrite it as

\frac{xP'(x^2, y)}{Q'(x^2, y)}

. Now the problem is reduced to a subproblem with same form with the degree of

x

halved. When

n = 0

, the variable

x

is entirely eliminated, and we only need to get

[y^k]\frac{P(y)}{Q(y)}

using polynomial inversion in

\mathcal O(n\log n)

time. (Can be done in

\mathcal O(n)

, but not necessary.)

Consider the time complexity of the algorithm: after each iteration, although the degree of

y

is lifted to twice as before, the degree of

x

is halved. Thus, for every polynomial involved in the calculation, the product of degree for

x, y

is always

\mathcal O(n)

, so we can multiply two polynomials within

\mathcal O(n\log n)

time.

We handled

\mathcal O(\log n)

multiplications, because after this number of iterations we will drop into the corner case of

n = 0

. Thus, the whole time complexity is

\mathcal O(n\log ^ 2 n)

Polynomial Composition

Now let us move on to polynomial composition. Suppose we have two polynomials

f, g

, and a new polynomial

g(f(x)) = g_0+g_1f^1(x)+g_2f^2(x)+\dots+g_nf^n(x)

is required. (Assume that calculation is done under

\bmod \ x^{n + 1}

.) The expression seems to have many similarities with the previous question.

We need to assign a coefficient for each single

f^k(x)

depending on

g(x)

. We can still maintain the variable

y

, since we know that the term with

y^k

should be multiplied by

g_k

Thus, we are actually multiplying the whole expression with

g_0y^0+g_1y^{-1}+g_2y^{-2}+\dots+g_ny^{-n}

to let every valid term be of

y^0

. With the help of this polynomial, the answer could be written as:

[y^0]\frac{g(y^{-1})}{1-yf(x)}\bmod \ x^{n + 1}

The following step is just the same as we have done in the previous part. Moreover, we need to store every coefficient from

x^0

x^n

, which is slightly different. But an advantage is that the numerator only relates to

y

, instead of both

x, y

Suppose we are handling

\frac{P(y)}{Q(x, y)}

. Still multiplying by

Q(-x, y)

, getting

\frac{P(y)Q(-x, y)}{Q'(x ^ 2, y)}

. But the problem is: if we still do casework on the parity of

n

, we will get two branches, leading to wrong time complexity as

T(m)=2T(m/2)+\mathcal O(n\log n)

Can we avoid operations on

x

? Putting

\frac{P(y)}{Q'(x^2, y)}

into the recursion might be a good choice. Now we have the results for

\frac{P(y)}{Q'(x^2, y)}

，and we need to multiply it by

Q(-x, y)

For the first iteration, as we need coefficient of

[y ^ 0]

, we just need to focus on

[y^0]

and

[y ^ {-1}]

\frac{P(y)}{Q'(x^2, y)}

, since

\deg_y(Q) = 1

. Now

\deg_y(Q')=2

, and we need

[y^0], [y^{-1}],[y^{-2}], [y^{-3}]

in the next iteration. So the number of needed values is depending on

\deg_y(Q)

. More specifically, we just need to figure out

[y^{-\deg_y(Q) + 1}]\sim [y^0]

in order to correctly deal with the previous iterations.

As we have said in the previous part,

\deg_x\deg_y = \mathcal O(n)

, and

\sum \deg_y(Q)=\sum \limits_{k = 0}^{\log n}\mathcal O(2^k) = \mathcal O(n)

, so the time complexity is still

\mathcal O(n\log n)

per iteration,

\mathcal O(n\log ^ 2 n)

in total. When

n = 0

, an inverse is needed, which can also be done in

\mathcal O(n \log n)

Thus, polynomial composition under modulo

x^{n + 1}

is solved in

\mathcal O(n\log ^2 n)

time complexity.

Code

P10249 (n = 2e5) runtime = 11s on luogu

CPP

#include <bits/stdc++.h>
using ll = long long;
using ld = long double;
using ull = unsigned long long;
using namespace std;
template <class T>
using Ve = vector<T>;
#define ALL(v) (v).begin(), (v).end()
#define pii pair<ll, ll>
#define rep(i, a, b) for(int i = (a); i <= (b); ++i)
#define per(i, a, b) for(int i = (a); i >= (b); --i)
#define pb push_back
bool Mbe;
ll read() {
    ll x = 0, f = 1; char ch = getchar();
    while(ch < '0' || ch > '9') {if(ch == '-') f = -1; ch = getchar();}
    while(ch >= '0' && ch <= '9') x = x * 10 + ch - '0', ch = getchar();
    return x * f;
}
void write(ll x) {
    if(x < 0) putchar('-'), x = -x;
    if(x > 9) write(x / 10);
    putchar(x % 10 + '0');
}
const ll N = 2e5 + 9;
const ll Mod = 998244353, G = 3, iG = (Mod + 1) / 3;
struct poly {
    Ve<int> a;
    int size() const {return a.size();}
    void resize(int n) {a.resize(n);}
    int operator[] (int n) const {
        assert(0 <= n && n < (int)a.size());
        return a[n];
    } 
    int &operator[] (int n) {
        assert(0 <= n && n < (int)a.size());
        return a[n];
    }
};
int rev[N << 5], tw[N << 5];
int Init(int n) {
    int p = 1, c = 0;
    while(p <= n) p <<= 1, ++c;
    rev[0] = 0;
    rep(i, 1, p - 1) rev[i] = (rev[i >> 1] >> 1) | ((i & 1) << (c - 1));
    return p;
}
int pw(int x, int p) {
    int res = 1;
    while(p) {
        if(p & 1) res = 1ll * res * x % Mod;
        x = 1ll * x * x % Mod, p >>= 1;
    }
    return res;
}
int Add(int x, int y) {
    return ((x += y) >= Mod) ? (x - Mod) : x;
}
int Sub(int x, int y) {
    return ((x -= y) < 0) ? (x + Mod) : x;
}
void NTT(poly &a, int n, int sgn) {
    rep(i, 0, n - 1) {
        if(i < rev[i]) swap(a[i], a[rev[i]]);
    }
    for(int i = 1; i < n; i <<= 1) {
        int wn = pw((sgn == 1 ? G : iG), (Mod - 1) / (i << 1));
        tw[0] = 1;
        rep(j, 1, i - 1) tw[j] = 1ll * tw[j - 1] * wn % Mod;
        for(int j = 0; j < n; j += (i << 1)) {
            rep(k, 0, i - 1) {
                int x = a[j + k], y = 1ll * a[j + k + i] * tw[k] % Mod;
                a[j + k] = Add(x, y), a[j + k + i] = Sub(x, y);
            }
        }
    }
    if(~sgn) return ;
    int iv = pw(n, Mod - 2);
    rep(i, 0, n - 1) a[i] = 1ll * a[i] * iv % Mod;
    return ;
}
poly operator * (const poly &a, const poly &b) {
    ll n = a.size(), m = b.size();
    ll p = Init(n + m - 1);
    poly f = a, g = b; f.resize(p), g.resize(p);
    NTT(f, p, 1), NTT(g, p, 1);
    rep(i, 0, p - 1) f[i] = 1ll * f[i] * g[i] % Mod;
    NTT(f, p, -1), f.resize(n + m - 1);
    return f;
}
poly Inv(poly h, int n){
    if(n == 1){
    poly f;
        f.resize(1), f[0] = pw(h[0], Mod - 2);
        return f;
    }
    int n0 = (n + 1) >> 1;
    poly h0 = h; h0.resize(n0);
    poly f0 = Inv(h0, n0); f0.resize(n);
    int len = Init(n << 1);
    poly _f0 = f0; _f0.resize(len), h.resize(len);
    NTT(_f0, len, 1), NTT(h, len, 1);
    rep(i, 0, len - 1) _f0[i] = 1ll * _f0[i] * _f0[i] % Mod * h[i] % Mod;
    NTT(_f0, len, -1), _f0.resize(n);
    rep(i, 0, n - 1) f0[i] = (f0[i] * 2ll - _f0[i] + Mod) % Mod;
    return f0;
}
Ve<poly> work(const poly &P, const Ve<poly> &Q, int n) {
    int maxy = (int)Q.size() - 1;
    if(!n) {
        poly Q0; Q0.resize(maxy + 1);
        rep(i, 0, maxy) Q0[i] = Q[i][0];
        Q0 = Inv(Q0, maxy + 1);
        int lenP = P.size(); poly R = P;
        reverse(ALL(R.a));
        R = R * Q0;
        poly res; res.resize(maxy);
        rep(i, 0, (int)R.size() - 1) {
            int cy = i - (lenP - 1);
            if(cy <= 0 && cy >= -maxy + 1) res[-cy] = R[i];
        }
        Ve<poly> ret; ret.resize(maxy);
        rep(i, 0, maxy - 1) ret[i].resize(1), ret[i][0] = res[i];
        return ret;
    }
    int n2 = (n << 1) | 1;
    int len = Init(n2 * max(maxy + 1, maxy * 2) + n2 * (maxy + 1));
    poly pos, neg; pos.resize(len), neg.resize(len);
    rep(i, 0, maxy) {
        rep(j, 0, (int)Q[i].size() - 1) {
            int pwr = i * n2 + j;
            pos[pwr] = Q[i][j];
            if(j & 1) neg[pwr] = (Mod - Q[i][j]);
            else neg[pwr] = Q[i][j];
        }
    }
    NTT(neg, len, 1);
    NTT(pos, len, 1);
    rep(i, 0, len - 1) pos[i] = 1ll * pos[i] * neg[i] % Mod;
    NTT(pos, len, -1);
    Ve<poly> Q0; Q0.resize(maxy * 2 + 1);
    rep(i, 0, maxy * 2) Q0[i].resize((n >> 1) + 1);
    rep(i, 0, (int)pos.size() - 1) {
        int cy = i / n2, cx = i - cy * n2;
        if(cx & 1) continue;
        if(cx > n || cy > maxy * 2) continue;
        Q0[cy][cx >> 1] = pos[i];
    }
    Ve<poly> ret0 = work(P, Q0, n >> 1);
    int len0 = ret0.size();
    reverse(ALL(ret0));
    Init(len - 1);
    pos.a.clear(), pos.resize(len);
    rep(i, 0, len0 - 1) {
        rep(j, 0, (int)ret0[i].size() - 1) {
            int pwr = i * n2 + j * 2;
            pos[pwr] = ret0[i][j];
        }
    }
    NTT(pos, len, 1);
    rep(i, 0, len - 1) pos[i] = 1ll * pos[i] * neg[i] % Mod;
    NTT(pos, len, -1);
    Ve<poly> ret; ret.resize(maxy);
    rep(i, 0, maxy - 1) ret[i].resize(n + 1);
    rep(i, 0, (int)pos.size() - 1) {
        int cy = i / n2, cx = i - cy * n2;
        cy -= (len0 - 1);
        if(cy <= 0 && cy >= -maxy + 1) {
            if(cx <= n) ret[-cy][cx] = pos[i];
        }
    }
    return ret;
}
int n, m;
poly f, g;
bool Med;
int main() {
    cerr << fabs(&Med - &Mbe) / 1048576.0 << "MB\n";
    n = read(), m = read();
    f.resize(n + 1), g.resize(m + 1);
    rep(i, 0, n) f[i] = read();
    rep(i, 0, m) g[i] = Mod - read();
    poly con; con.resize(1), con[0] = 1;
    Ve<poly> h = work(f, {con, g}, n);
    assert(h.size() == 1);
    rep(i, 0, n) write(h[0][i]), putchar(' ');
    putchar('\n');
    cerr << "\n" << clock() * 1.0 / CLOCKS_PER_SEC * 1000 << "ms\n";
    return 0;
}

References:

Yasunori Kinoshita, Baitian Li: Power Series Composition in Near-Linear Time.
alpha1022: 多项式不存在了：多项式复合（逆）的 O(nlog^2n) 做法.

An easier method of polynomial composition: No more transposition theorem is needed!

文章操作

Introduction

Solution

Polynomial Composition

Code

References:

相关推荐

评论

An easier method of polynomial composition: No more transposition theorem is needed!