About the Author

Zhu Hui, who has worked as a simulator for several years, has been GDB for several years, and has been doing Linux kernel optimization for several years in Xiaomi TV, mainly around MM.

Now a software engineer at HyperHQ.

Zhu Hui: The code principle of Linux Kernel iowait time and the introduction of kernel extension articles

I saw the exact meaning of I/O wait time in Linux in the Linux version I loved. I feel that it is good to write. It is a bit embarrassing to not implement the source code, so I read the code myself.

When the task occurs iowait, the kernel handles them by switching the task out, allowing the runnable task to run first, and setting in_iowait to 1 before switching out, and in_iowait is set to wake up again. Original value. Related functions io_schedule, io_schedule_timeout, mutex_lock_io, mutex_lock_io_nested.

E.g:

Zhu Hui: The code principle of Linux Kernel iowait time and the introduction of kernel extension articles

This shows that in_iowait indicates whether the task is in iowait.

In addition, note that these switch functions except for mutex_lock_io, mutex_lock_io_nested will set the task running state to TASK_UNINTERRUPTIBLE, the kernel will set the task running state TASK_UNINTERRUPTIBLE before calling io_schedule, io_schedule_timeout.

When the process switch function __schedule is switching the task, if the in_iowait of the task being switched out is true, the nr_iowait in the run queue rq structure of the CPU is incremented by one.

Because the previous pair of tasks has been set to TASK_UNINTERRUPTIBLE, the task needs to be awakened, and the reduction operation for nr_iowait is also done in the task wakeup function.

This shows that nr_iowait can indicate whether there is a task in iowait on a CPU, and the number.

Because the task in iowait is in the TASK_UNINTERRUPTIBLE state, it is not in the ready queue, so it is not balanced by the CPU load to other CPUs, so nr_iowait does not need to deal with load balancing issues.

When the system idle time is accumulated, if the CPU's nr_iowait is true, that is, the current cpu has a task waiting for iowait, it is recorded as iowait time.

In the kernel that opens NO_HZ, the relevant code is in update_ts_time_stats.

Zhu Hui: The code principle of Linux Kernel iowait time and the introduction of kernel extension articles

If it is not open, it is at account_idle_time.

Zhu Hui: The code principle of Linux Kernel iowait time and the introduction of kernel extension articles

When the associated /proc/stat interface is accessed, get_iowait_time will access this time and return.

In summary, the iowait time is the CPU idle time, but at this time, the CPU is not completely running without TASK, but one or several tasks in the dormant task are iowait tasks.

Of course, there are idle tasks on the CPU when idle and iowait.

Finally, I recommend an article in the Ali kernel group as an extension to read Kernel Documents/new iowait calculation.

More interesting is here:

+ wait_event_interruptible_hrtimeout(ctx->wait,

+ aio_read_events(ctx, min_nr, nr, event, &ret), until);

Regardless of the value of the timeout value until, the wait_event_interruptible_hrtimeout is called. Although the real-time nature of the hrtimer is already high, the macro __wait_event_hrtimeout used to actually process the wait can be seen using the hrtimer initialization:

Hrtimer_start_range_ns(&__t.timer, timeout,\

Current->timer_slack_ns,\

HRTIMER_MODE_REL);\

The third parameter current->timer_slack_ns is the trigger range passed to the hrtimer, because the hrtimer has high real-time performance, but the frequent triggering system obviously can't stand it, so every time the hrtimer triggers, the timer in the time range will be processed (see __hrtimer_run_queues). ). So timeout+current->timer_slack_ns is the last trigger time of the set hrtimer. The default value of current->timer_slack_ns is 50000, which means 50000 nanoseconds. That is, this clock will trigger after 50,000 nanoseconds, and it may be triggered by the previous hrtimer.

So in wait_event_interruptible_hrtimeout, once ctx->wait is not ready, even if the set timeout is 0, it is very likely to call schedule once, which causes the iowait time to vary greatly, and it also greatly hurts performance.

And this problem has also been fixed by 5f785de588735306ec4d7c875caf9d28481c8b21, this code changed to:

- wait_event_interruptible_hrtimeout(ctx->wait,

- aio_read_events(ctx, min_nr, nr, event, &ret), until);

+ if (until.tv64 == 0)

+ aio_read_events(ctx, min_nr, nr, event, &ret);

+ else

+ wait_event_interruptible_hrtimeout(ctx->wait,

+ aio_read_events(ctx, min_nr, nr, event, &ret),

+ until);

Thus, when until is 0, aio_read_events is called directly. There should be no more obvious iowait problems, and this fix will give io_getevents calls a performance improvement of more than 100 times.

Of course, the reason why this iowait is not accurate is still there. Once the task switch is needed, there will still be an inaccurate problem.

Finally, I want to spit out the design of aio. Are you still waiting for aio?

Plastic SCSI Cover Section

Small computer system interface (SCSI) is an independent processor standard for system level interfaces between computers and intelligent devices (hard disks, floppy drives, optical drives, printers, scanners, etc.). SCSI is an intelligent universal interface standard.

SCSI-3
In 1995, the more high-speed SCSI-3, called ultrasci, was born, and the data transmission rate reached 20MB / s. It increases the synchronous transmission clock frequency to 20MB / s and improves the data transmission rate. If 16 bit wide mode is used, the data transmission rate can be increased to 40MB / s. This version of SCSI uses a 68 pin interface, which is mainly used on hard disks. The typical characteristic of SCSI-3 is that the bus frequency is greatly increased and the signal interference is reduced to enhance its stability.

There are many models of SCSI-3. Ultra (FAST-20) has a transmission frequency of 20MHz, a data bandwidth of 8 bits and a transmission rate of 20MBps

Ultra wide has a transmission frequency of 20MHz, a data bandwidth of 16 bits and a transmission rate of 40mbps

The transmission frequency of ultra 2 is 80 MHz, the data bandwidth is 16 bits, and the transmission rate is 80 Mbps

The transmission frequency of ultra 160 is 80 MHz, the data bandwidth is 16 bits, and the transmission rate is 160 Mbps

The transmission frequency of ultra 320 is 80MHz, the data bandwidth is 16 bits, and the transmission rate is 320mbps

The transmission frequency of ultra 640 is 160MHz, the data bandwidth is 16 bits, and the transmission rate is 640mbps

Plastic SCSI Cover

ShenZhen Antenk Electronics Co,Ltd , http://www.coincellholder.com