c# - Dealing with very large Lists on x86 -
i need work large lists of floats, hitting memory limits on x86 systems. not know final length, need use expandable type. on x64 systems, can use <gcallowverylargeobjects>
.
my current data type:
list<rawdata> param1 = new list<rawdata>(); list<rawdata> param2 = new list<rawdata>(); list<rawdata> param3 = new list<rawdata>(); public class rawdata { public string name; public list<float> data; }
the length of paramn lists low (currently 50 or lower), data can 10m+. when length 50, hitting memory limits (outofmemoryexception
) @ above 1m data points, , when length 25, hit limit @ above 2m data points. (if calculations right, 200mb, plus size of name, plus overhead). can use increase limit?
edit: tried using list<list<float>>
max inner list size of 1 << 17 (131072), increased limit somewhat, still not far want.
edit2: tried reducing chunk size in list> 8192, , got oom @ ~2.3m elements, task manager reading ~1.4gb process. looks need reduce memory usage in between data source , storage, or trigger gc more - able gather 10m data points in x64 process on pc 4gb ram, iirc process never went on 3gb
edit3: condensed code down parts handle data. http://pastebin.com/mayckk84
edit4: had in dotmemory, , found data structure take ~1gb settings testing on (50ch * 3 params * 2m events = 300,000,000 float elements). guess need limit on x86 or figure out how write disk in format data
first of all, on x86 systems memory limit 2gb, not 200mb. presume problem more trickier that. have aggressive loh (large object heap) fragmentation.
clr uses different heaps small , large objects. object large if size larger 85,000 bytes. loh fractious thing, not eager return unused memory os, , poor @ defragmentation.
.net list implementation of arraylist data structure, stores elements in array, has fixed size; when array filled, new array doubled size created. continuous growth of array amount of data "starvation" scenario loh.
so, have use tailor-made data structure suit needs. e.g. list of chunks, each chunk small enough not loh. here small prototype:
public class chunkedlist { private readonly list<float[]> _chunks = new list<float[]>(); private const int chunksize = 8000; private int _count = 0; public void add(float item) { int chunk = _count / chunksize; int ind = _count % chunksize; if (ind == 0) { _chunks.add(new float[chunksize]); } _chunks[chunk][ind] = item; _count ++; } public float this[int index] { { if(index <0 || index >= _count) throw new indexoutofrangeexception(); int chunk = index / chunksize; int ind = index % chunksize; return _chunks[chunk][ind]; } set { if(index <0 || index >= _count) throw new indexoutofrangeexception(); int chunk = index / chunksize; int ind = index % chunksize; _chunks[chunk][ind] = value; } } //other code require }
with chunksize
= 8000 every chunk take 32,000 bytes, not loh. _chunks
loh when there 16,000 chunks in collection, more 128 million elements in collection (about 500 mb).
upd i've performed stress tests sample above. os x64, solution platform x86. chunksize 20000.
first:
var list = new chunkedlist(); (int = 0; ; i++) { list.add(0.1f); }
outofmemoryexception raised @ ~324,000,000 elements
second:
public class rawdata { public string name; public chunkedlist data = new chunkedlist(); } var list = new list<rawdata>(); (int = 0;; i++) { var raw = new rawdata { name = "test" + }; (int j = 0; j < 20 * 1000 * 1000; j++) { raw.data.add(0.1f); } list.add(raw); }
outofmemoryexception raised @ i=17, j~12,000,000. 17 rawdata instances created, 20 million data points per each, 352 million data points totally.
Comments
Post a Comment