annotate zlib/contrib/asm586/README.586 @ 51:ae3a4bfb450b

add some files of version 4.4.3 that have been forgotten.
author kent <kent@cr.ie.u-ryukyu.ac.jp>
date Sun, 07 Feb 2010 18:27:48 +0900
parents
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
51
ae3a4bfb450b add some files of version 4.4.3 that have been forgotten.
kent <kent@cr.ie.u-ryukyu.ac.jp>
parents:
diff changeset
1 This is a patched version of zlib modified to use
ae3a4bfb450b add some files of version 4.4.3 that have been forgotten.
kent <kent@cr.ie.u-ryukyu.ac.jp>
parents:
diff changeset
2 Pentium-optimized assembly code in the deflation algorithm. The files
ae3a4bfb450b add some files of version 4.4.3 that have been forgotten.
kent <kent@cr.ie.u-ryukyu.ac.jp>
parents:
diff changeset
3 changed/added by this patch are:
ae3a4bfb450b add some files of version 4.4.3 that have been forgotten.
kent <kent@cr.ie.u-ryukyu.ac.jp>
parents:
diff changeset
4
ae3a4bfb450b add some files of version 4.4.3 that have been forgotten.
kent <kent@cr.ie.u-ryukyu.ac.jp>
parents:
diff changeset
5 README.586
ae3a4bfb450b add some files of version 4.4.3 that have been forgotten.
kent <kent@cr.ie.u-ryukyu.ac.jp>
parents:
diff changeset
6 match.S
ae3a4bfb450b add some files of version 4.4.3 that have been forgotten.
kent <kent@cr.ie.u-ryukyu.ac.jp>
parents:
diff changeset
7
ae3a4bfb450b add some files of version 4.4.3 that have been forgotten.
kent <kent@cr.ie.u-ryukyu.ac.jp>
parents:
diff changeset
8 The effectiveness of these modifications is a bit marginal, as the the
ae3a4bfb450b add some files of version 4.4.3 that have been forgotten.
kent <kent@cr.ie.u-ryukyu.ac.jp>
parents:
diff changeset
9 program's bottleneck seems to be mostly L1-cache contention, for which
ae3a4bfb450b add some files of version 4.4.3 that have been forgotten.
kent <kent@cr.ie.u-ryukyu.ac.jp>
parents:
diff changeset
10 there is no real way to work around without rewriting the basic
ae3a4bfb450b add some files of version 4.4.3 that have been forgotten.
kent <kent@cr.ie.u-ryukyu.ac.jp>
parents:
diff changeset
11 algorithm. The speedup on average is around 5-10% (which is generally
ae3a4bfb450b add some files of version 4.4.3 that have been forgotten.
kent <kent@cr.ie.u-ryukyu.ac.jp>
parents:
diff changeset
12 less than the amount of variance between subsequent executions).
ae3a4bfb450b add some files of version 4.4.3 that have been forgotten.
kent <kent@cr.ie.u-ryukyu.ac.jp>
parents:
diff changeset
13 However, when used at level 9 compression, the cache contention can
ae3a4bfb450b add some files of version 4.4.3 that have been forgotten.
kent <kent@cr.ie.u-ryukyu.ac.jp>
parents:
diff changeset
14 drop enough for the assembly version to achieve 10-20% speedup (and
ae3a4bfb450b add some files of version 4.4.3 that have been forgotten.
kent <kent@cr.ie.u-ryukyu.ac.jp>
parents:
diff changeset
15 sometimes more, depending on the amount of overall redundancy in the
ae3a4bfb450b add some files of version 4.4.3 that have been forgotten.
kent <kent@cr.ie.u-ryukyu.ac.jp>
parents:
diff changeset
16 files). Even here, though, cache contention can still be the limiting
ae3a4bfb450b add some files of version 4.4.3 that have been forgotten.
kent <kent@cr.ie.u-ryukyu.ac.jp>
parents:
diff changeset
17 factor, depending on the nature of the program using the zlib library.
ae3a4bfb450b add some files of version 4.4.3 that have been forgotten.
kent <kent@cr.ie.u-ryukyu.ac.jp>
parents:
diff changeset
18 This may also mean that better improvements will be seen on a Pentium
ae3a4bfb450b add some files of version 4.4.3 that have been forgotten.
kent <kent@cr.ie.u-ryukyu.ac.jp>
parents:
diff changeset
19 with MMX, which suffers much less from L1-cache contention, but I have
ae3a4bfb450b add some files of version 4.4.3 that have been forgotten.
kent <kent@cr.ie.u-ryukyu.ac.jp>
parents:
diff changeset
20 not yet verified this.
ae3a4bfb450b add some files of version 4.4.3 that have been forgotten.
kent <kent@cr.ie.u-ryukyu.ac.jp>
parents:
diff changeset
21
ae3a4bfb450b add some files of version 4.4.3 that have been forgotten.
kent <kent@cr.ie.u-ryukyu.ac.jp>
parents:
diff changeset
22 Note that this code has been tailored for the Pentium in particular,
ae3a4bfb450b add some files of version 4.4.3 that have been forgotten.
kent <kent@cr.ie.u-ryukyu.ac.jp>
parents:
diff changeset
23 and will not perform well on the Pentium Pro (due to the use of a
ae3a4bfb450b add some files of version 4.4.3 that have been forgotten.
kent <kent@cr.ie.u-ryukyu.ac.jp>
parents:
diff changeset
24 partial register in the inner loop).
ae3a4bfb450b add some files of version 4.4.3 that have been forgotten.
kent <kent@cr.ie.u-ryukyu.ac.jp>
parents:
diff changeset
25
ae3a4bfb450b add some files of version 4.4.3 that have been forgotten.
kent <kent@cr.ie.u-ryukyu.ac.jp>
parents:
diff changeset
26 If you are using an assembler other than GNU as, you will have to
ae3a4bfb450b add some files of version 4.4.3 that have been forgotten.
kent <kent@cr.ie.u-ryukyu.ac.jp>
parents:
diff changeset
27 translate match.S to use your assembler's syntax. (Have fun.)
ae3a4bfb450b add some files of version 4.4.3 that have been forgotten.
kent <kent@cr.ie.u-ryukyu.ac.jp>
parents:
diff changeset
28
ae3a4bfb450b add some files of version 4.4.3 that have been forgotten.
kent <kent@cr.ie.u-ryukyu.ac.jp>
parents:
diff changeset
29 Brian Raiter
ae3a4bfb450b add some files of version 4.4.3 that have been forgotten.
kent <kent@cr.ie.u-ryukyu.ac.jp>
parents:
diff changeset
30 breadbox@muppetlabs.com
ae3a4bfb450b add some files of version 4.4.3 that have been forgotten.
kent <kent@cr.ie.u-ryukyu.ac.jp>
parents:
diff changeset
31 April, 1998
ae3a4bfb450b add some files of version 4.4.3 that have been forgotten.
kent <kent@cr.ie.u-ryukyu.ac.jp>
parents:
diff changeset
32
ae3a4bfb450b add some files of version 4.4.3 that have been forgotten.
kent <kent@cr.ie.u-ryukyu.ac.jp>
parents:
diff changeset
33
ae3a4bfb450b add some files of version 4.4.3 that have been forgotten.
kent <kent@cr.ie.u-ryukyu.ac.jp>
parents:
diff changeset
34 Added for zlib 1.1.3:
ae3a4bfb450b add some files of version 4.4.3 that have been forgotten.
kent <kent@cr.ie.u-ryukyu.ac.jp>
parents:
diff changeset
35
ae3a4bfb450b add some files of version 4.4.3 that have been forgotten.
kent <kent@cr.ie.u-ryukyu.ac.jp>
parents:
diff changeset
36 The patches come from
ae3a4bfb450b add some files of version 4.4.3 that have been forgotten.
kent <kent@cr.ie.u-ryukyu.ac.jp>
parents:
diff changeset
37 http://www.muppetlabs.com/~breadbox/software/assembly.html
ae3a4bfb450b add some files of version 4.4.3 that have been forgotten.
kent <kent@cr.ie.u-ryukyu.ac.jp>
parents:
diff changeset
38
ae3a4bfb450b add some files of version 4.4.3 that have been forgotten.
kent <kent@cr.ie.u-ryukyu.ac.jp>
parents:
diff changeset
39 To compile zlib with this asm file, copy match.S to the zlib directory
ae3a4bfb450b add some files of version 4.4.3 that have been forgotten.
kent <kent@cr.ie.u-ryukyu.ac.jp>
parents:
diff changeset
40 then do:
ae3a4bfb450b add some files of version 4.4.3 that have been forgotten.
kent <kent@cr.ie.u-ryukyu.ac.jp>
parents:
diff changeset
41
ae3a4bfb450b add some files of version 4.4.3 that have been forgotten.
kent <kent@cr.ie.u-ryukyu.ac.jp>
parents:
diff changeset
42 CFLAGS="-O3 -DASMV" ./configure
ae3a4bfb450b add some files of version 4.4.3 that have been forgotten.
kent <kent@cr.ie.u-ryukyu.ac.jp>
parents:
diff changeset
43 make OBJA=match.o