快速读入（solution of P10815）

0x00 写在前面

该题解将会提供多种快速读入/输出的模板并详细讲解原理（包括 mmap 版本的快读和目前的最优解）。
该题解会提供完整的代码（但请不要直接抄袭）。

\texttt{\color{white}其实是把自己的专栏搬过来了……}

IO 优化是很实用 & 简单的常数优化，应该也是大部分 OIer 学习或接触卡常的第一种卡常数方式。当读入数据超过 4MB 时，读入优化可以节省好几倍时间。

0x01 `cin` / `cout` 的优化（关闭同步和解除绑定）

`ios::sync_with_stdio(0)`

这个函数是一个「是否兼容 stdio」的开关，C++ 为了兼容 C，保证程序在使用了 printf 和 cout 的时候不发生混乱，将输出流绑到了一起。同步的输出流是线程安全的。

这其实是 C++ 为了兼容而采取的保守措施，也是使 cin/cout 速度较慢的主要原因。但我们可以在使用 cin/cout 之前将 stdio 解除绑定，能够明显地提升 cin/cout 的 IO 速度，但是在这样做之后要注意不能同时使用 cin 和 scanf，也不能同时使用 cout 和 printf，但是可以同时使用 cin 和 printf，也可以同时使用 scanf 和 cout（即不同的输入或输出方式不能混合使用）。

`cin.tie(0),cout.tie(0)`

tie 是将两个 stream（istream，ostream）绑定的参数，如果括号里填空参数（nullptr），则返回当前的输出流指针。

在默认的情况下 cin.tie() 绑定的是 &std::cout，每次进行格式化输入的时候都要调用 cout.flush() 清空输出缓冲区，这样会减慢 IO 速度。可以通过 cin.tie(nullptr) 来解除绑定，进一步加快执行效率。

需要注意的是，如果你需要在每次输入之前输出结果（如交互题），在进行 cin 和 cout 解绑之后，每次都需要调用 cout.flush() 或 cout << flush。如果输出时刚好需要换行，也可以调用 cout << endl。

下面的程序是一个例子：

CPP

cout << "Please input an integer:";
cout << "\n", cout.flush(); // 刷新缓冲区，也可以换成 cout << "\n" << flush; 或 cout << endl;
cin >> x;

使用示例：

CPP

#include <iostream>
using namespace std;
int main() {
	ios::sync_with_stdio(0), cin.tie(0), cout.tie(0);
// Your code begins here.
	int n, x, ans = 0;
	cin >> n;
	while (n --) cin >> x, ans += x;
	cout << ans;
// End.
	return 0;
}

测试结果：读入

10^7

个整数需要

643

ms 左右，无法在

2500

ms 内读入

10^8

个整数。

0x02 `C` 风格输入的优化

`getchar()`

getchar 是用来读入 1 byte 的数据并将其转换为 char 类型的函数（但实际上的返回值是 int），且速度很快，故可以用「读入字符——转换为整型」来代替缓慢的读入。

我们需要考虑输入的组成部分：整数由两部分组成——符号（一般是负号 -）和数字，整数之间有换行和空格。

10 进制整数中是不含空格或除 0~9 和（正）负号外的其他字符的，因此在读入不应存在于整数中的字符（通常为空格 / 换行）时，就可以判定已经读入结束。

C 和 C++ 语言分别在 ctype.h （C/C++）和 cctype（C++）头文件中，提供了函数 isdigit, 这个函数会检查传入的参数是否为十进制数字字符，是则返回一个非零整数，否则返回零。所以我们判定一个字符是否是十进制数字字符时可以使用 isdigit(c) 代替 c >= '0' && c <= '9'，也可以使用 !isdigit(c) 代替 c < '0' || c > '9'（而且 isdigit 的速度比 c >= '0' && c <= '9' 更快）。

下面是使用示例：

CPP

#include <cstdio>
#include <cctype>
int readint() {
	int k = 0, f = 1, c = getchar(); // 实际上变量 c 没有必要转成 char
	for (; !isdigit(c); c = getchar()) if (c == '-') f = -1; // 由于 isdigit 的接收值也是 int，所以确实没有必要转成 char
	for (; isdigit(c); c = getchar()) k = k * 10 + (c ^ 48); // 也可以写 (c & 15)
	return k * f;
}
int main() {
	int n = readint(), ans = 0;
	while (n --) ans += readint();
	printf("%d", ans);
}

评测结果：读入

10^7

个整数需要

290

ms 左右，但仍然无法在

2500

ms 内读入

10^8

个整数。

`getchar_unlocked` 函数：

普通的 getchar 带线程锁，然而 OI 用不上多线程（~~你用多线程就算作弊~~），所以我们在日常使用中可以把它关闭，以达到提高输入效率的目的，故可用更快的 getchar_unlocked 代替。

线程锁：

在多线程环境中，多个线程可能同时访问共享资源（举例如文件、内存、全局变量），线程锁在这个时候用于强制互斥访问：某个线程访问资源前先上锁，其他线程就阻塞在一边排队等待；操作完成后释放锁，允许其他线程访问。这可以防止 Data Race 导致程序崩溃或数据错误。

getchar 本身是线程安全的，它每调用一次就要先上锁，再读一个字符，再开锁，来保证安全。但是如果只有一个线程访问标准输入（事实上 OI 中整个程序只有一个线程），这个频繁开关锁的过程就没有必要了。

所以使用不带线程锁直接读入的 getchar_unlocked 可以加快读入速度。

注意：getchar_unlocked 需要在 Linux 下才能通过编译，Windows 下需要使用 _getchar_nolock

使用示例：

getchar_unlocked() 和 getchar() 在功能上基本一致，所以只需要将上述代码中的 getchar 替换成 getchar_unlocked 即可。

CPP

#include <cstdio>
#include <cctype>
int readint() {
	int k = 0, f = 1, c = getchar_unlocked();
	for (; !isdigit(c); c = getchar_unlocked()) if (c == '-') f = -1;
	for (; isdigit(c); c = getchar_unlocked()) k = k * 10 + (c ^ 48);
	return k * f;
}
int main() {
	int n = readint(), ans = 0;
	while (n --) ans += readint();
	printf("%d", ans);
}

评测结果：读入

10^7

个整数需要

182

ms 左右，并能在

1950

ms 内读入

10^8

个整数。

附：更方便本地运行的宏定义

由于 getchar_unlocked() 在 Windows 系统上没有定义，所以在本地无法运行。因此我们可以在代码中添加以下宏定义方便自己进行调试。

CPP

#ifdef __unix__
#define gc getchar_unlocked
#else
#define gc _getchar_nolock
#endif

该代码在常见的 Linux 和 Windows 环境中都不会 CE。

0x03 `fread` 快读

~~目前来讲很多人认为的最快的读入方式（事实证明显然不是）。~~

1. 介绍

通过 fread 可以实现更快的读入。

fread 能将需要的文件部分读入内存缓冲区。

由于 fread 是整段整段读取，所以比 getchar() 要快的多。

fread 类似于参数为 "%s" 的 scanf，不过它更为快速，而且可以一次性读入若干个字符（包括空格、换行，甚至可以是不可见字符），如果缓存区足够大，甚至可以一次性读入整个文件。

以下是 fread 的函数原型（C）：

size_t fread(void *ptr, size_t size, size_t nmemb, FILE *stream);

函数 fread 会从 stream 流中读取 nmemb 个长度为 size 字节大小的数据，并将它们存储在 ptr 给定的位置，返回读取的字节数。

如 fread(buf, 1, 114514, stdin); 就是从标准输入流中读取

114514

个 1 字节大小的字符，存储在 buf 里。

2. 使用

我们可以重定义一个 getchar() 函数（以下是 gc 函数）。可以一次性读入

2^{20}

个字符到 buf，然后从 buf 中读入一个 char，也就是头指针向后移动一位。

CPP

char buf[1<<20], *p1, *p2;
char gc() { // getchar
	if (p1 == p2) { // 读入的 2^20 个字符已经被读完
		p1 = buf;
		p2 = buf + fread(buf, 1, 1<<20, stdin); // 再读 2^20 个字符
		if (p1 == p2) return EOF; // p1 == p2，说明没有读入成功，即读到文件结束符
	}
	return *p1++; // 读入的 2^20 个字符没有用完
}
int readint() { // 普通的快读，仅更改 getchar() 为 gc()
	int k = 0, f = 1, c = gc();
	for (; !isdigit(c); c = gc()) if (c == '-') f = -1;
	for (; isdigit(c); c = gc()) k = k * 10 + (c ^ 48);
	return k * f;
}

当然上面的写法是为了方便理解，实际上多次调用函数很影响效率，甚至添加 inline 也是无效的，导致上述代码甚至没有普通快读快，所以一般我们把它写成 #define 形式。

CPP

#include <cctype>
#include <cstdio>
#include <cstdint>
uint8_t buf[1<<20], *p1, *p2;
#define gc() (p1==p2 && (p2=(p1=buf)+fread(buf,1,1<<20,stdin)),*p1++)
int readint() {
	int k = 0, f = 1, c = gc();
	for (; !isdigit(c); c = gc()) if (c == '-') f = -1;
	for (; isdigit(c); c = gc()) k = k * 10 + (c ^ 48);
	return k * f;
}
int main() {
	int n = readint(), ans = 0;
	while (n --) ans += readint();
	printf("%d", ans);
}

评测结果：读入

10^7

个整数需要

137

ms 左右，并能在

1450

ms 内读入

10^8

个整数。

注：

调试的时候，输入完成后请手动输入 EOF（Linux 上为 Ctrl + D，Windows 上为 Ctrl + Z）。
使用此快读时请不要和其它输入方式混用（包括 getchar、scanf、cin），否则会因为缓存问题产生错误。（除非你非常了解原理，而且清楚自己在做什么）

0x04 `mmap` 快读

冷门函数。

1. 介绍

mmap 是 Linux 系统函数，它的功能是将文件一次性地映射到内存中（注意不是拷贝）。在一些场景下有更优的速度。

注意 mmap 不能在 Windows 系统中通过编译，同时也不建议在正式的 OI 赛制的比赛中使用（但在很多卡常大赛中使用的频率很高）。且在使用前需要引入头文件 fcntl.h，unistd.h，sys/stat.h 与 sys/mman.h。

读入示例：首先要获取文件描述符 fd，然后通过 fstat 获取文件信息以得到文件大小，此后通过 char *pc = (char *) mmap(NULL, state.st_size, PROT_READ, MAP_PRIVATE, fd, 0); 将指针 *pc 指向我们的文件。可以直接用 *pc ++ 替代 getchar()。

当我们要提交不使用文件操作的题目时，可以将 fd 设为 0，表示从 stdin 读入。但是，对 stdin 使用 mmap 是极其危险的行为，同时不能在终端输入，所以不建议使用 mmap 从标准输入中读入。

mmap 函数的使用示例（该代码实现了读入一个整数并输出）：

CPP

#include <bits/stdc++.h>
#include <sys/mman.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>
#define gc() (*pc++)
char *pc;
int readint() {
	int k = 0, f = 1, c = gc(); // getchar 返回值都是 int，当然没有必要转成 char
	for (; !isdigit(c); c = gc()) if (c == '-') f = -1;
	for (; isdigit(c); c = gc()) k = k * 10 + (c ^ 48); // 也可以写 (c & 15)
	return k * f;
}

int main() {
	int fd = open("*.in", O_RDONLY);
	struct stat state;
	fstat(fd, &state);
	pc = (char *)mmap(NULL, state.st_size, PROT_READ, MAP_PRIVATE, fd, 0);
	close(fd);
	printf("%d", readint());
}

2. `mmap` 在快读上的使用

2.1（基础用法）

直接照搬 fread，然后把 fread 改为 mmap，但此时我们不需要 buf 数组，而是使用指针（char* 或 uint8_t*）代替。

CPP

#include <cctype>
#include <cstdio>
#include <sys/mman.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>
uint8_t* str;
#define gc() (*str ++)
int readint() {
	int k = 0, f = 1; uint8_t c = gc();
	for (; !isdigit(c); c = gc()) if (c == '-') f = -1;
	for (; isdigit(c); c = gc()) k = k * 10 + (c ^ 48);
	return k * f;
}
int main() {
	struct stat s;
	fstat(0,&s);
	str=(uint8_t*)mmap(NULL,s.st_size,1,2,0,0); // PROT_READ, MAP_PRIVATE 分别是 1,2
// ↑以上是 mmap 进行读入操作
	int n = readint(), ans = 0;
	while (n --) ans += readint();
	printf("%d", ans);
}

评测结果：读入

10^7

个整数需要

124

ms 左右，并能在

1280

ms 内读入

10^8

个整数。

2.2 打表优化（非常规优化方式）

思想来源于曾经的最优解。

可以发现我们现在读入字符仍然是一个一个字符读入的（因为是 *str ++，每次只移动一个字节）。如果我们每次读入的字符大于一个，可以更加节省时间。

考虑到一个 char / uint8_t 的范围为

-128\sim 127

或

0\sim 255

，一共是

256

种取值，所以两个 char 组合一共有

65536

种情况。

可以发现

65536

是一个比较小的数字，开一个长度为

65536

的数组进行打表也是完全可行的。所以我们可以打出两个字符拼成的数字是什么，然后使用下标访问得到答案。

打表代码：

CPP

uint32_t a[65536+1];
memset(a, -1, sizeof a);
for (int i = 48; i <= 57; ++ i)
	for (int j = 48; j <= 57; ++ j)
		a[i << 8 | j] = (j ^ 48) * 10 + (i ^ 48);

读入整数：

CPP

int v, f; // v: 输入数字的绝对值；f：输入数字的符号（0 正；1 负）
for (; n --; ) {
	c+=f=*c==45; // 等价于 { f = 0; if (*c == '-') f = 1, ++ c; } // 读到负号，则记录，并跳过负号
	v=0;
	if (~a[*(uint16_t*)(c)]) v=a[*(uint16_t*)c],c+=2; // 一次读入两个数字
	if (~a[*(uint16_t*)(c)]) v=v*100+a[*(uint16_t*)c],c+=2;
	if (~a[*(uint16_t*)(c)]) v=v*100+a[*(uint16_t*)c],c+=2;
	if (~a[*(uint16_t*)(c)]) v=v*100+a[*(uint16_t*)c],c+=2;
	if (~a[*(uint16_t*)(c)]) v=v*100+a[*(uint16_t*)c],c+=2;
	if (*c>47) v=v*10+((*c++)^48); // 一次读入一个数字（如果长度是奇数就会执行到这里）
	ans+=(f?-v:v);
	while (isspace(*c)) ++c; // 跳过空格/换行
}

需要注意的是，这里一次性得到两个字符需要使用指针强制转换，因为 *c<<16|*c 中会增加两次位运算。

完整代码：

CPP

#include<stdio.h>
#include<ctype.h>
#include<stdint.h>
#include<string.h>
#include<sys/stat.h>
#include<sys/mman.h>

int main(){
	int a[0x10000+1]; // 打表数组（0x10000 == 65536）
	memset(a,-1,0x40000);
	for (int i=48;i<=57;++i)
		for (int j=48;j<=57;++j)
			a[i<<8|j]=(j^48)*10+(i^48);
	struct stat s;
	fstat(0,&s);
	char* c=(char*)mmap(0, s.st_size, PROT_READ, MAP_SHARED, 0, 0);
	int n = 0, ans=0;
	while (isdigit(*c)) n=n*10+((*c++)^48);
	while (isspace(*c)) ++c; // 跳过空格和换行
	int v, f; // v: 输入数字的绝对值；f：输入数字的符号（0 正；1 负）
	for (; n --; ) {
		c+=f=*c==45; // 等价于 { f = 0; if (*c == '-') f = 1, ++ c; } // 读到负号，则记录，并跳过负号
		v=0;
		if (~a[*(uint16_t*)(c)]) v=a[*(uint16_t*)c],c+=2; // 一次读入两个数字
		if (~a[*(uint16_t*)(c)]) v=v*100+a[*(uint16_t*)c],c+=2;
		if (~a[*(uint16_t*)(c)]) v=v*100+a[*(uint16_t*)c],c+=2;
		if (~a[*(uint16_t*)(c)]) v=v*100+a[*(uint16_t*)c],c+=2;
		if (~a[*(uint16_t*)(c)]) v=v*100+a[*(uint16_t*)c],c+=2;
		if (*c>47) v=v*10+((*c++)^48); // 一次读入一个数字（如果长度是奇数就会执行到这里）
		ans+=(f?-v:v);
		while (isspace(*c)) ++c; // 跳过空格/换行
	} printf("%d", ans);
}

评测结果：读入

10^7

个整数需要

63

ms 左右，并能在

626

ms 内读入

10^8

个整数。

2.3 其它优化

我们的优化并没有到达极限。

我们发现 a 数组中的数字是由 $12336$ 个 -1 + 每两个字符对应的数字 + $50786$ 个 -1 组成的，所以我们可以将 a 数组定义成 const 类型。但很显然这样会增加代码长度，所以我们需要使用 #define 减小代码长度。
在 5 个 if 语句中我们可以发现越前面的语句越容易满足，所以我们可以使用 __builtin_expected(CONDITION,0/1) 告诉编译器条件 CONDITION 更偏向 False(0)/True(1)（根据题目的输入进行优化）。
发现输入的数字 $\in[-n,n]$ ，其中 $n_{max}=10^8$ ，说明输入数字长度最多为 9，所以可以去除一个 if。
发现第二行输入的数中间都是一个空格，所以可以用一个 ++c 代替 while。

根据优化 1，我们可以像这样初始化

a

数组。

CPP

#define N1 -1,-1,-1,-1,-1,-1,-1,-1,-1,-1
#define N2 N1,N1,N1,N1,N1,N1,N1,N1,N1,N1
#define N3 N2,N2,N2,N2,N2,N2,N2,N2,N2,N2
#define N4 N3,N3,N3,N3,N3,N3,N3,N3,N3,N3
#define D(a) a,1##a,2##a,3##a,4##a,5##a,6##a,7##a,8##a,9##a
#define S246 N2,N2,N1,N1,N1,N1,-1,-1,-1,-1,-1,-1
const int _mp[0x10000]={
N4,N3,N3,N2,N2,N2,N1,N1,N1,-1,-1,-1,-1,-1,-1, // 12336个-1
D(0),S246,D(1),S246,D(2),S246,D(3),S246,D(4),S246, // 数字组0-4
D(5),S246,D(6),S246,D(7),S246,D(8),S246,D(9), // 数字组5-9
N4,N4,N4,N4,N4,N2,N2,N2,N2,N2,N2,N2,N2,N1,N1,N1,N1,N1,N1,N1,N1,
-1,-1,-1,-1,-1,-1
};
#undef N1
#undef N2
#undef N3
#undef N4
#undef D
#undef S246

700

多个字符就完成初始化了。

根据优化 2，我们可以把第一个 if 语句写成 if (__builtin_expect(~a[*(uint16_t*)(c)], 1)) v=a[*(uint16_t*)c],c+=2;

所以我们得到了以下代码（但是用 and 短路代替了常规的 if）：

CPP

#include<stdio.h>
#include<ctype.h>
#include<stdint.h>
#include<string.h>
#include<sys/stat.h>
#include<sys/mman.h>

#define N1 -1,-1,-1,-1,-1,-1,-1,-1,-1,-1
#define N2 N1,N1,N1,N1,N1,N1,N1,N1,N1,N1
#define N3 N2,N2,N2,N2,N2,N2,N2,N2,N2,N2
#define N4 N3,N3,N3,N3,N3,N3,N3,N3,N3,N3
#define D(a) a,1##a,2##a,3##a,4##a,5##a,6##a,7##a,8##a,9##a
#define S246 N2,N2,N1,N1,N1,N1,-1,-1,-1,-1,-1,-1
const int _mp[0x10000]={
N4,N3,N3,N2,N2,N2,N1,N1,N1,-1,-1,-1,-1,-1,-1, // 12336个-1
D(0),S246,D(1),S246,D(2),S246,D(3),S246,D(4),S246, // 数字组0-4
D(5),S246,D(6),S246,D(7),S246,D(8),S246,D(9), // 数字组5-9
N4,N4,N4,N4,N4,N2,N2,N2,N2,N2,N2,N2,N2,N1,N1,N1,N1,N1,N1,N1,N1,
-1,-1,-1,-1,-1,-1
};
#undef N1
#undef N2
#undef N3
#undef N4
#undef D
#undef S246
int main(){
	struct stat s; fstat(0,&s); uint8_t* c=(uint8_t*)mmap(NULL,s.st_size,1,2,0,0);
	int v,f,n=0,ans=0;
	while (isdigit(*c)) n=n*10+(15&*c++);
	while (isspace(*c)) ++c;
	for (;n--;++c){
		c+=f=(*c==45);
		v=0;
		__builtin_expect(~_mp[*(uint16_t*)(c)], 1) && (v=_mp[*(uint16_t*)(c)],c+=2),
		(~_mp[*(uint16_t*)(c)]) && (v=v*100+_mp[*(uint16_t*)(c)],c+=2),
		(~_mp[*(uint16_t*)(c)]) && (v=v*100+_mp[*(uint16_t*)(c)],c+=2),
		(~_mp[*(uint16_t*)(c)]) && (v=v*100+_mp[*(uint16_t*)(c)],c+=2),
		isdigit(*c) && (v=v*10+(15&*c++)), ans+=(f?-v:v);
	}
	printf("%d", ans);
	return 0;
}

评测结果（C++）：读入

10^7

个整数需要

53

ms 左右，并能在

464

ms 内读入

10^8

个整数。

评测结果（C）：读入

10^7

个整数需要

52

ms 左右，并能在

451

ms 内读入

10^8

个整数。

~~然后就拿下了这道题的最优解（除旧数据和套数据点外）~~

2.4 可能的继续优化方向

以下是可能能够提升速度的方向：

找到 $3$ 字节整数类型（int24），然后将打表一次两个字符改为三个字符。
减少类型转换次数。
有更快的函数能替代 isdigit/isspace（已知比较运算和位运算不可能更快）。
~~更多玄学做法/人类智慧~~。

2.5 封装成模板的写法

转载自 masonxiong，已经过作者同意。

该版本兼容性更好，安全性更高，而且能同时在 Linux 和 Windows 下运行，需要 C++ 11 及以上的编译器才能够运行。（虽然效率不及原版本）

CPP

#include <cstdio>
#include <cctype>
#include <cstring>
#include <cstdint>
#ifdef __linux__
#include <fcntl.h>
#include <unistd.h>
#include <sys/mman.h>
#include <sys/stat.h>
#endif

namespace {
	static const std::int32_t* const table = []() -> const std::int32_t* {
		static std::int32_t result[0x10000];
		std::memset(result, -1, sizeof(result));
		for (std::int32_t i = 48; i != 58; ++i)
			for (std::int32_t j = 48; j != 58; ++j)
				result[i << 8 | j] = (j & 15) * 10 + (i & 15);
		return result;
	}();

	static const char* cursor = []() -> const char* {
	#ifdef __linux__
		struct stat status;
		fstat(STDIN_FILENO, &status);
		return reinterpret_cast<const char*>(mmap(nullptr, status.st_size, PROT_READ, MAP_PRIVATE, STDIN_FILENO, 0));
	#else
		static char buffer[16 << 20];
		std::fread(buffer, sizeof(char), sizeof(buffer) / sizeof(char), stdin);
		return buffer;
	#endif
	}();

	static inline auto r32() -> std::int32_t {
		std::int32_t number = 0;
		bool sign = *cursor == 45 && ++cursor;
		~table[*reinterpret_cast<const std::uint16_t*>(cursor)] && (number = number * 100 + table[*reinterpret_cast<const std::uint16_t*>(cursor)], cursor += 2);
		~table[*reinterpret_cast<const std::uint16_t*>(cursor)] && (number = number * 100 + table[*reinterpret_cast<const std::uint16_t*>(cursor)], cursor += 2);
		~table[*reinterpret_cast<const std::uint16_t*>(cursor)] && (number = number * 100 + table[*reinterpret_cast<const std::uint16_t*>(cursor)], cursor += 2);
		~table[*reinterpret_cast<const std::uint16_t*>(cursor)] && (number = number * 100 + table[*reinterpret_cast<const std::uint16_t*>(cursor)], cursor += 2);
		~table[*reinterpret_cast<const std::uint16_t*>(cursor)] && (number = number * 100 + table[*reinterpret_cast<const std::uint16_t*>(cursor)], cursor += 2);
		for (std::isdigit(*cursor) && (number = number * 10 + (*cursor++ & 15)); std::isspace(*++cursor); );
		return sign ? -number : number;
	}
}

int main() {}

0x05 输出部分

这部分与本题无关，为了节省篇幅，放在云剪切板上。

0x06 IO 优化的弊端

IO 优化虽然能够显著提升程序的运行速度，但并非没有缺点。

IO 优化可能会导致程序可读性下降，尤其是使用 fread、mmap 或宏定义形式的函数会导致调试困难。
容易引入隐蔽 Bug，比如缓冲区溢出，没有特判 0 或 -2147483648（以 32 位整型变量为例），或者将 EOF（文件结束符）“吃掉”。
优化可能适得其反，比如递归形式的普通快写在很多时候会比去除同步流的 cout 更慢。
其它低级错误导致代码挂掉。

0x07 THE END

部分内容参考了 oi-wiki 和其它题解。
如果有问题/更好的做法欢迎指出。
~~点赞后再走。~~

快速读入（solution of P10815）

文章操作

0x00 写在前面

0x01 `cin` / `cout` 的优化（关闭同步和解除绑定）

`ios::sync_with_stdio(0)`

`cin.tie(0),cout.tie(0)`

使用示例：

0x02 `C` 风格输入的优化

`getchar()`

`getchar_unlocked` 函数：

线程锁：

使用示例：

附：更方便本地运行的宏定义

0x03 `fread` 快读

1. 介绍

2. 使用

0x04 `mmap` 快读

1. 介绍

2. `mmap` 在快读上的使用

2.1（基础用法）

2.2 打表优化（非常规优化方式）

2.3 其它优化

2.4 可能的继续优化方向

2.5 封装成模板的写法

0x05 输出部分

0x06 IO 优化的弊端

0x07 THE END

相关推荐

评论

快速读入（solution of P10815）

文章操作

0x00 写在前面

0x01 cin / cout 的优化（关闭同步 和 解除绑定）

ios::sync_with_stdio(0)

cin.tie(0),cout.tie(0)

使用示例：

0x02 C 风格输入的优化

getchar()

getchar_unlocked 函数：

线程锁：

使用示例：

附：更方便本地运行的宏定义

0x03 fread 快读

1. 介绍

2. 使用

0x04 mmap 快读

1. 介绍

2. mmap 在快读上的使用

2.1（基础用法）

2.2 打表优化（非常规优化方式）

2.3 其它优化

2.4 可能的继续优化方向

2.5 封装成模板的写法

0x05 输出部分

0x06 IO 优化的弊端

0x07 THE END

相关推荐

评论

0x01 `cin` / `cout` 的优化（关闭同步和解除绑定）

`ios::sync_with_stdio(0)`

`cin.tie(0),cout.tie(0)`

0x02 `C` 风格输入的优化

`getchar()`

`getchar_unlocked` 函数：

0x03 `fread` 快读

0x04 `mmap` 快读

2. `mmap` 在快读上的使用