- 浏览: 299886 次
文章分类
最新评论
-
shoru:
最新的netbeans scala插件无法安装到6.9中呀,求 ...
Scala-对Java的修正和超越-Presentation -
alanwu:
night_stalker 写道讲一次多少钱啊,能透露下么…… ...
Scala-对Java的修正和超越-Presentation -
wzjin:
暂时没有考虑学习,等有一定市场再看!
Scala-对Java的修正和超越-Presentation -
dcaoyuan:
jasspier 写道对scala有点兴趣,有没有入门的推荐教 ...
Scala-对Java的修正和超越-Presentation -
jasspier:
草原兄,听了你的课,对scala有点兴趣,有没有入门的推荐教程 ...
Scala-对Java的修正和超越-Presentation
Tim Bray's Erlang Exercise on Large Dataset Processing - Round II
- 博客分类:
- /Erlang
Updated Oct 09: Added more benchmark results under linux on other machines.
Updated Oct 07: More concise code.
Updated Oct 06: Fixed bugs: 1. Match "GET /ongoing/When/" instead of "/ongoing/When/"; 2. split_on_last_newline should not reverse Tail.
Backed from a short vacation, and sit down in front of my computer, I'm thinking about Tim Bray's exercise again.
As I realized, the most expensive procedure is splitting dataset to lines. To get the multiple-core benefit, we should parallelize this procedure instead of reading file to binary or macthing process only.
In my previous solution, there are at least two issues:
- Since the file reading is fast in Erlang, then, parallelizing the file reading is not much helpful.
- The buffered_read actually can be merged with the buffered file reading.
And, Per's solution parallelizes process_match procedure only, based on a really fast divide_to_lines, but with hacked binary matching syntax.
After a couple of hours working, I finially get the second version of tbray.erl (with some code from Per's solution).
- Read file to small pieces of binary (about 4096 bytes each chunk), then convert to list.
- Merge the previous tail for each chunk, search this chunk from tail, find the last new line mark, split this chunk to line-bounded data part, and tail part for next chunk.
- The above steps are difficult to parallelize. If we try, there will be about 30 more LOC, and not so readable.
- Spawn a new process at once to split line-bounded chunk to lines, process match and update dict.
- Thus we can go on reading file with non-stop.
- A collect_loop will receive dicts from each process, and merge them.
What I like of this version is, it scales on mutiple-core almost linearly! On my 2.0G 2-core MacBook, it took about 13.522 seconds with non-smp, 7.624 seconds with smp enabled (for a 200M data file, with about 50,000 processes spawned). The 2-core smp result achieves about 77% faster than non-smp result. I'm not sure how will it achieve on an 8-core computer, but we'll finally reach the limit due to the un-parallelized procedures.
The Erlang time results:
$ erlc tbray.erl $ time erl -noshell -run tbray start o1000k.ap -s erlang halt > /dev/null real 0m13.522s user 0m12.265s sys 0m1.199s $ erlc -smp tbray.erl $ time erl -smp +P 60000 -noshell -run tbray start o1000k.ap -s erlang halt > /dev/null real 0m7.624s user 0m13.302s sys 0m1.602s # For 5 million lines, 958.4M size: $ time erl -smp +P 300000 -noshell -run tbray start o5000k.ap -s erlang halt > /dev/null real 0m37.085s user 1m5.605s sys 0m7.554s
And the original Tim's Ruby version:
$ time ruby tbray.rb o1000k.ap > /dev/null real 0m2.447s user 0m2.123s sys 0m0.306s # For 5 million lines, 958.4M size: $ time ruby tbray.rb o5000k.ap > /dev/null real 0m12.115s user 0m10.494s sys 0m1.473s
Erlang time result on 2-core 1.86GHz CPU RedHat linux box, with kernel:
Linux version 2.6.18-1.2798.fc6 (brewbuilder@hs20-bc2-4.build.redhat.com) (gcc v
ersion 4.1.1 20061011 (Red Hat 4.1.1-30)) #1 SMP Mon Oct 16 14:37:32 EDT 2006
is 7.7 seconds.
Erlang time result on 2.80GHz 4-cpu xeon debian box, with kernel:
Linux version 2.6.15.4-big-smp-tidy (root@test) (gcc version 4.0.3 20060128 (prerelease) (Debian 4.0
.2-8)) #1 SMP Sat Feb 25 21:24:23 CST 2006
The smp result on this 4-cpu computer is questionable. It speededup only 50% than non-smp, even worse than my 2.0GHz 2-core MacBook. I also tested the Big Bang on this machine, it speedup less than 50% too.
$ erlc tbray.erl $ time erl -noshell -run tbray start o1000k.ap -s erlang halt > /dev/null real 0m22.279s user 0m21.597s sys 0m0.676s $ erlc -smp tbray.erl $ time erl -smp +S 4 +P 60000 -noshell -run tbray start o1000k.ap -s erlang halt > /dev/null real 0m14.765s user 0m28.722s sys 0m0.840s
Notice:
- All tests run several times to have the better result expressed, so, the status of disk/io cache should be near.
- You may need to compile tbray.erl to two different BEAMs, one for smp version, and one for no-smp version.
- If you'd like to process bigger file, you can use +P processNum to get more simultaneously alive Erlang processes. For BUFFER_SIZE=4096, you can set +P arg as FileSize / 4096, or above. From Erlang's Efficiency Guide:
Processes
The maximum number of simultaneously alive Erlang processes is by default 32768. This limit can be raised up to at most 268435456 processes at startup (see documentation of the system flag +P in the erl(1) documentation). The maximum limit of 268435456 processes will at least on a 32-bit architecture be impossible to reach due to memory
To evaluate with smp enable: (Erlang/OTP R11B-5 for Windows may not support smp yet)
erl -smp +P 60000 > tbray:start("o1000k.ap").
The code: (pretty formatted by ErlyBird 0.15.1)
-module(tbray_blog). -compile([native]). -export([start/1]). %% The best Bin Buffer Size is 4096 -define(BUFFER_SIZE, 4096). start(FileName) -> Start = now(), Main = self(), Collector = spawn(fun () -> collect_loop(Main) end), {ok, File} = file:open(FileName, [raw, binary]), read_file(File, Collector), %% don't terminate, wait here, until all tasks done. receive stop -> io:format("Time: ~10.2f ms~n", [timer:now_diff(now(), Start) / 1000]) end. read_file(File, Collector) -> read_file_1(File, [], 0, Collector). read_file_1(File, PrevTail, I, Collector) -> case file:read(File, ?BUFFER_SIZE) of eof -> Collector ! {chunk_num, I}, file:close(File); {ok, Bin} -> {Data, NextTail} = split_on_last_newline(PrevTail ++ binary_to_list(Bin)), spawn(fun () -> Collector ! {dict, scan_lines(Data)} end), read_file_1(File, NextTail, I + 1, Collector) end. split_on_last_newline(List) -> split_on_last_newline_1(lists:reverse(List), []). split_on_last_newline_1(List, Tail) -> case List of [] -> {lists:reverse(List), []}; [$\n|Rest] -> {lists:reverse(Rest), Tail}; [C|Rest] -> split_on_last_newline_1(Rest, [C | Tail]) end. collect_loop(Main) -> collect_loop_1(Main, dict:new(), undefined, 0). collect_loop_1(Main, Dict, ChunkNum, ChunkNum) -> print_result(Dict), Main ! stop; collect_loop_1(Main, Dict, ChunkNum, ProcessedNum) -> receive {chunk_num, ChunkNumX} -> collect_loop_1(Main, Dict, ChunkNumX, ProcessedNum); {dict, DictX} -> Dict1 = dict:merge(fun (_, V1, V2) -> V1 + V2 end, Dict, DictX), collect_loop_1(Main, Dict1, ChunkNum, ProcessedNum + 1) end. print_result(Dict) -> SortedList = lists:reverse(lists:keysort(2, dict:to_list(Dict))), [io:format("~p\t: ~s~n", [V, K]) || {K, V} <- lists:sublist(SortedList, 10)]. scan_lines(List) -> scan_lines_1(List, [], dict:new()). scan_lines_1(List, Line, Dict) -> case List of [] -> match_and_update_dict(lists:reverse(Line), Dict); [$\n|Rest] -> scan_lines_1(Rest, [], match_and_update_dict(lists:reverse(Line), Dict)); [C|Rest] -> scan_lines_1(Rest, [C | Line], Dict) end. match_and_update_dict(Line, Dict) -> case process_match(Line) of false -> Dict; {true, Word} -> dict:update_counter(Word, 1, Dict) end. process_match(Line) -> case Line of [] -> false; "GET /ongoing/When/"++[_,_,_,$x,$/,Y1,Y2,Y3,Y4,$/,M1,M2,$/,D1,D2,$/|Rest] -> case match_until_space(Rest, []) of [] -> false; Word -> {true, [Y1,Y2,Y3,Y4,$/,M1,M2,$/,D1,D2,$/] ++ Word} end; [_|Rest] -> process_match(Rest) end. match_until_space(List, Word) -> case List of [] -> []; [$.|_] -> []; [$ |_] -> lists:reverse(Word); [C|Rest] -> match_until_space(Rest, [C | Word]) end.
Lessons learnt:
- Split large binary to proper size chunks, then convert to list for further processing
- Parallelize the most expensive part (of course)
- We need a new or more complete Efficent Erlang
发表评论
-
Updated "From Rails to Erlyweb" Part I and Part II
2007-07-30 00:13 836I've updated "From Rails t ... -
From Rails to Erlyweb - Part IV
2007-07-30 00:14 878IV. Support Mysql Spatial Exten ... -
HTML Entity Refs and xmerl
2007-07-30 00:14 936According to [erlang-bugs] xme ... -
From Rails to Erlyweb - Part III
2007-07-30 00:14 8383. The Magic Behind Erlyweb Wit ... -
From Rails to Erlyweb - Part I
2007-08-05 06:34 1199Updated July 20 2007: new param ... -
ErlyBird 0.12.0 released - Erlang IDE based on NetBeans
2007-08-08 18:40 865I'm pleased to announce ErlyBir ... -
A Simple POET State Machine Accepting SAX Events to Build Plain Old Erlang Term
2007-08-20 07:19 960Per previous blogs: A Simple ... -
A Simple XML State Machine Accepting SAX Events to Build xmerl Compitable XML Tree: icalendar demo
2007-08-20 07:23 1101xmerl is a full XML functionali ... -
recbird - An Erlang Dynamic Record Inferring Parse Transform
2007-08-23 08:31 1019You should have read Yariv's re ... -
From Rails to Erlyweb - Part II
2007-08-23 21:16 1229Updated Aug 23: Please see Fro ... -
From Rails to Erlyweb - Part II Manage Project - Reloaded
2007-08-24 19:20 1318The migrating from Rails to Erl ... -
Parse JSON to xmerl Compitable XML Tree via A Simple XML State Machine
2007-09-02 20:00 1494Updated Aug 16: Fix bugs when j ... -
Tim Bray's Erlang Exercise on Large Dataset Processing
2007-10-10 20:12 1211Updated Oct 10: pread_file/5 s ... -
Reading File in Parallel in Erlang (Was Tim Bray's Erlang Exercise - Round III)
2007-10-15 19:56 1866My first solution for Tim's ex ... -
The Erlang Way (Was Tim Bray's Erlang Exercise - Round IV)
2007-10-17 07:45 1760Playing with Tim's Erlang Exerc ... -
Learning Coding Parallelization (Was Tim's Erlang Exercise - Round V)
2007-10-17 21:58 1531Updated Oct 16: After testing ...
相关推荐
之前我们介绍了计算beta-NTI(beta nearest taxon index)来进行群落构建分析。|beta-NTI| >2说明决定性过程主导,其中beta-NTI >2说明OTU的遗传距离发散,为生物交互作用主导,beta-NTI < -2则说明OUT的遗传距离...
BRAY VAL-SRS系列电动执行器说明书pdf,BRAY VAL-SRS系列电动执行器说明书
Chapter 4, Finding Things, by Tim Bray, draws together many strands in Computer Science in an exploration of a problem that is fundamental to many computing tasks. Chapter 5, Correct, Beautiful, Fast ...
Self-modeling as a treatment for increasing on-task behavior SELF-MODELING AS A TREATMENT FOR INCREASING ON-TASK BEHAVIOR susan k. clare Clovis Unified School District, Clovis, California ...
Steven Robbins的文章指出,图灵奖得主Jim Gray很早就提出了“内存将成为硬盘,硬盘将成为磁带”的说法(出自2006年Tim Bray一篇讨论网格计算的博客,2003年的访谈中他已经表达了同样的意思)。2008年Dare Obsanjo...
We obtain the isoperimetric ...ing recent work of Bray and Morgan on isoperimetric comparison. We then discuss these results in the context of Bray’s isoperimetric approach to the Penrose inequality.
Tim Bray:2014年软件之路 后端架构 MongoDB与内存 《淘宝技术这十年》读书笔记 - 大CC 探索 Hibernate 新 TableGenerator 机制 服务好“最后一公里”,高效CDN架构经验 探索推荐引擎内部的秘密 一起 select 引起的...
Good book on Algebraic topology, it is used for graduate student in mathematics and other science fields with solid mathematical background and knowledge to read or use as a reference.
【原 书 名】 Bluetooth Application Developer's Guide:The Short Range Interconnect Solution 【原出版社】 Syngress Publishing 【作 者】David Kammer Gordon McNutt Brian Senese Jennifer Bray [同作者...
它解析相对URL(例如Tim Bray的“进行中”看到的URL)。 它可以正确处理XML名称空间(包括在非常规feed中为主要feed元素定义非默认名称空间的XML名称空间)。 安装 npm install feedparser 用法 本示例只是为了...
Self-modeling as a treatment for increasing on-task behavior SELF-MODELING AS A TREATMENT FOR INCREASING ON-TASK BEHAVIOR susan k. clare Clovis Unified School District, Clovis, California ...
Web 2.0人物访谈录。Wiley Web 2.0 Heroes Interviews ...19 Bob Brewin & Tim Bray: Sun Microsystems . . . . . . . . . . . . . . 229 20 Michele Turner:Adobe Systems Incorporated . . . . . . . . . . . . 243
Bertha是用于高端存储子系统的I / O基准测试工具。 基于Tim Bray的Bonnie基准测试工具,并进行了三项增强:1)生成更多的I / O; 2)提供重播I / O事务的工具;以及3)广泛的指标报告。
法律障碍状态代码 这是用于报告法律障碍的 HTTP 状态代码的拟议规范的... 原始网址: 此存储库用于对 Tim Bray 的草稿进行提议的更改。 具有拟议更改的当前规范位于: 通过标准的 fork 和 pull 方法提交任何更改。
该软件套件是使用Java和面向对象的程序编写的,使用Bray-Curtis相似性可识别与给定基因具有相似组织范围表达谱的其他基因,并使用敲除数据来识别转录因子靶基因。 它还使用基于信息论的位置权重矩阵对基因启动子中的...
Acknowledgments xiii Introduction xv 1. Making Games the Modular Way 1 1.1 Important Programming Concepts.....................................2 1.1.1 Manager and Controller Scripts.......................
P的形态根据Wilson的顺序萃取法进行分析,并使用Bray No. 2和Truog方法进行测定。 土壤中主要存在两种化学形式,即铁和锰的氢氧化物(Fe-Mn-P)以及有机和生物型的磷(Org-P),它们很容易通过改变土壤氧化还原条件...
常用的成熟的生成高斯分布随机数序列的方法由Marsaglia和Bray在1964年提出,C++版本如下: 代码如下:#include <stdlib>#include <math.h> double gaussrand(){ static double V1, V2, S; static int phase =...
电动执行器pdf,电动执行器
The intent of this article was to present an on-going line of research that has focused on the design of an effective, easily implemented, economical, and parsimonious treatment for disruptive class