My first solution for Tim's exercise tried to read file in parallel, but I just realized by reading file module's source code, that file:open(FileName, Options) will return a process instead of IO device. Well, this means a lot:
- It's a process, so, when you request more data on it, you actually send message to it. Since you only send 2 integer: the offset and length, sending message should be very fast. But then, this process (File) will wait for receiving data from disk/io. For one process, the receiving is sequential rather than parallelized.
- If we look the processes in Erlang as ActiveObjects, which send/receive messages/data in async, since the receiving is sequential in one process, requesting/waiting around one process(or, object) is almost safe for parallelized programming, you usaully do not need to worry about lock/unlock etc. (except the outside world).
- We can open a lot of File processes to read data in parallel, the bound is the disk/IO and the os' resources limit.
I wrote some code to test file reading in parallel, discardng the disk cache, on my 2-core MacBook, reading file with two processes can speedup near 200% to one process.
The code:
-module(file_pread).
-compile([native]).
-export([start/2]).
-include_lib("kernel/include/file.hrl").
start(FileName, ProcNum) ->
[start(FileName, ProcNum, Fun) || Fun <- [fun read_file/3, fun pread_file/3]].
start(FileName, ProcNum, Fun) ->
Start = now(),
Main = self(),
Collector = spawn(fun () -> collect_loop(Main) end),
Fun(FileName, ProcNum, Collector),
receive
stop -> io:format("time: ~10.2f ms~n", [timer:now_diff(now(), Start) / 1000])
end.
collect_loop(Main) -> collect_loop_1(Main, undefined, 0).
collect_loop_1(Main, ChunkNum, ChunkNum) ->
Main ! stop;
collect_loop_1(Main, ChunkNum, ProcessedNum) ->
receive
{chunk_num, ChunkNumX} ->
collect_loop_1(Main, ChunkNumX, ProcessedNum);
{seq, _Seq} ->
collect_loop_1(Main, ChunkNum, ProcessedNum + 1)
end.
get_chunk_size(FileName, ProcNum) ->
{ok, #file_info{size=Size}} = file:read_file_info(FileName),
Size div ProcNum.
read_file(FileName, ProcNum, Collector) ->
ChunkSize = get_chunk_size(FileName, ProcNum),
{ok, File} = file:open(FileName, [raw, binary]),
read_file_1(File, ChunkSize, 0, Collector).
read_file_1(File, ChunkSize, I, Collector) ->
case file:read(File, ChunkSize) of
eof ->
file:close(File),
Collector ! {chunk_num, I};
{ok, _Bin} ->
Collector ! {seq, I},
read_file_1(File, ChunkSize, I + 1, Collector)
end.
pread_file(FileName, ProcNum, Collector) ->
ChunkSize = get_chunk_size(FileName, ProcNum),
pread_file_1(FileName, ChunkSize, ProcNum, Collector).
pread_file_1(FileName, ChunkSize, ProcNum, Collector) ->
[spawn(fun () ->
Length = if I == ProcNum - 1 -> ChunkSize * 2;
true -> ChunkSize end,
{ok, File} = file:open(FileName, [read, binary]),
{ok, _Bin} = file:pread(File, ChunkSize * I, Length),
Collector ! {seq, I},
file:close(File)
end) || I <- lists:seq(0, ProcNum - 1)],
Collector ! {chunk_num, ProcNum}.
The pread_file/3 is parallelized, it always opens new File process for each reading process instead of sharing one opened File process during all reading processes. The read_file/3 is non-parallelized.
To evaulate: (run at least two-time for each test to average disk/IO caches.)
$ erlc -smp file_pread.erl
$ erl -smp
1> file_pread:start("o100k.ap", 2).
time: 691.72 ms
time: 44.37 ms
[ok,ok]
2> file_pread:start("o100k.ap", 2).
time: 74.50 ms
time: 43.59 ms
[ok,ok]
3> file_pread:start("o1000k.ap", 2).
time: 1717.68 ms
time: 408.48 ms
[ok,ok]
4> file_pread:start("o1000k.ap", 2).
time: 766.00 ms
time: 393.71 ms
[ok,ok]
5>
Let's compare the results for each file (we pick the second testing result of each), the speedup:
- o100k.ap, 20M, 74.50 / 43.59 - 1= 70%
- o1000k.ap, 200M, 766.00 / 393.71 - 1 = 95%
On another 4-CPU debian machine, with 4 processes, the best result I got:
4> file_pread:start("o1000k.ap", 4).
time: 768.59 ms
time: 258.57 ms
[ok, ok]
5>
The parallelized reading speedup 768.59 / 258.57 -1 = 197%
I've updated my first solution according to this testing, opening new File process for each reading process instead of sharing the same File process. Of cource, there are still issues that I pointed in Tim Bray's Erlang Exercise on Large Dataset Processing - Round II
Although the above result can also be achieved in other Languages, but I find that coding parallelization in Erlang is a pleasure.
分享到:
相关推荐
Erlang and OTP in Action teaches you to apply Erlang’s message passing model for concurrent programming–a completely different way of tackling the problem of parallel programming from the more ...
A multi-user game, web site, cloud application, or networked database can... That's because Erlang uses sets of parallel processes-not a single sequential process, as found in most programming languages.
parallel-supercomputing-in-SIMD-architectures.tgz
Erlang/OTP 19.1 is a service release containing mostly bug fixes, as well as a number of new features and characteristics improvements. Some highlights of the release are: erts: Improved dirty ...
The paper derives and analyses the (semi-)discrete dispersion relation of the Parareal parallel-in-time integration method. It investigates Parareal’s wave propagation characteristics with the aim to...
Concurrency in .NET teaches you how to build concurrent and scalable programs in .NET using the functional paradigm. This intermediate-level guide is aimed at developers, architects, and passionate ...
CLIP-Q: Deep Network Compression Learning by In-Parallel Pruning-Quantization,CVPR2018 code
Parallel Programming in OpenMP
In this paper, we discuss time-dependent PDE problems, which are always non-self-adjoint. We propose a block circulant preconditioner for the all-at-once evolutionary PDE system which has block ...
The most exciting development in parallel computer architecture is the convergence of traditionally disparate approaches on a common machine structure. This book explains the forces behind this ...
拷贝jmeter-parallel-0.9.jar到Jmeter/lib/ext上。 启动Jmeter。 根据需要添加Parallel Controller: 1)在Jmeter的线程组下面的逻辑控制器,选择bzm并行控制器; 2)把浏览器或者wireShark观察到的同一批次的并发...
intel parallel studio xe 2013 license file intel parallel studio xe 2013 license file
Chapter 7 - Advanced Heterogeneous Parallel Programming in mpC Chapter 8 - Toward a Message-Passing Library for Heterogeneous Networks of Computers Part III - Applications of Heterogeneous ...
Parallel_Grouped_Aggregation_in_DuckDB_-_DuckDB.pdf
- A parallel port loop back plug for the parallel port test. - A USB port loop back plug for the USB port test. - A USB 2.0 port loop back plug for the USB 2.0 port test. - PassMark ModemTest V1.3 ...
并行文件系统中元数据密集型I/O优化方法,谢柯,李秀桥,随着并行文件系统规模的不断扩大,元数据I/O的性能对于并行文件系统变得至关重要。密集型元数据I/O应用产生了大量小数据量的元数据
Parallel.Programming.in.C.with.Mpi.and.Openmp,.Quinn,.MGH,.2003,.258s.pdf