2. The overhead of reading and writing extremely wide cases when you are doubtless not using more than a small fraction of them will limit performance. And you don't want to be paging the variable dictionary. If you have lots of RAM, you can probably reach between 32,000 and 100,000 variables before memory paging degrades performance seriously.(我也不会用到那么多的variables)
5. These points apply mainly to the number of variables. The number of cases is not subject to the same problems, because the cases are not generally all mapped into memory by SPSS (although Windows may cache them). However, there are some procedures that because of their computational requirements do have to hold the entire dataset in memory, so those would not scale well up to immense numbers of cases.(估计用SPSS对上亿个cases的数据做个频数分布都非常苦难)
Modern database practice would be to break up your variables into cohesive subsets and combine these with join (MATCH FILES in SPSS) operations when you need variables from more than one subset. SPSS is not a relational database, but working this way will be much more efficient and practical with very large numbers of variables. (大量的variables能用subsets,大量的cases呢)
对于Stata,一句话,靠可用的内存。具体见FAQs by Kevin S. Turner, StataCorp:
1.Under all current 32-bit Windows operating systems (Windows 95, 98, ME, NT, 2000, XP, Vista), the total available address space for any application is 2.1 GB. If you have a dataset larger than 2.1 GB, you will not be able to load it on Stata for Windows.
2.Unfortunately, even if your dataset is under the 2.1 GB limit, you may run into difficulty when loading it into Stata. The fault again lies with how Windows manages the 2.1 GB address space. You may be surprised to find that a 1.4 GB dataset loaded fine one time, but failed to load a subsequent time. This is simply an unfortunate side effect of Windows memory management.
3.The 64-bit platform will enable you to work with very large datasets. Depending on your operating system, you should be able to allocate as much memory as you have on the machine minus the system requirements. To take advantage of this technology, you will need 64-bit compatible hardware, a 64-bit operating system, and, of course, a 64-bit version of Stata.
4.As a last resort, you may consider trimming any unnecessary data from your dataset or dividing the dataset into two files. Depending on your data and analysis this may not be feasible, and is only offered as a suggestion.
没有评论:
发表评论