Ask Your Question

Revision history [back]

click to hide/show revision 1
initial version

Loading and analyzing data, Datamining

I am considering using Sage as a data-mining type tool. Is this way off base? If so where should I head? I do not see much in the reference docs on loading and interfacing with large-ish datafiles? As well as cleaning, standardizing and faceting the data.

In particular I have a 18mg compressed CSV file that expands to at least 512 mg.. It has about 20 fields per record and 400,000 or so records (it is Residential Real Estate listing data) I need to summarize, average and count by area, subdivision and dates (and more) I had been using An Access database, but am looking to move to Python, and avoid an SQL structure if possible, since now it seems we have enough memory to hold the data in memory..

My brief search has not turned up any similar examples?? At the least any pointers to loading the compressed CSV? and guidance on the right Python/Sage Data structures to start off with?

click to hide/show revision 2
retagged

Loading and analyzing data, Datamining

I am considering using Sage as a data-mining type tool. Is this way off base? If so where should I head? I do not see much in the reference docs on loading and interfacing with large-ish datafiles? As well as cleaning, standardizing and faceting the data.

In particular I have a 18mg compressed CSV file that expands to at least 512 mg.. It has about 20 fields per record and 400,000 or so records (it is Residential Real Estate listing data) I need to summarize, average and count by area, subdivision and dates (and more) I had been using An Access database, but am looking to move to Python, and avoid an SQL structure if possible, since now it seems we have enough memory to hold the data in memory..

My brief search has not turned up any similar examples?? At the least any pointers to loading the compressed CSV? and guidance on the right Python/Sage Data structures to start off with?

click to hide/show revision 3
retagged

Loading and analyzing data, Datamining

I am considering using Sage as a data-mining type tool. Is this way off base? If so where should I head? I do not see much in the reference docs on loading and interfacing with large-ish datafiles? As well as cleaning, standardizing and faceting the data.

In particular I have a 18mg compressed CSV file that expands to at least 512 mg.. It has about 20 fields per record and 400,000 or so records (it is Residential Real Estate listing data) I need to summarize, average and count by area, subdivision and dates (and more) I had been using An Access database, but am looking to move to Python, and avoid an SQL structure if possible, since now it seems we have enough memory to hold the data in memory..

My brief search has not turned up any similar examples?? At the least any pointers to loading the compressed CSV? and guidance on the right Python/Sage Data structures to start off with?