Skip to content

Category: Uncategorized

How much memory does importing PANDAS library take?

Objective

Let us compare the memory consumption of Python PANDAS library.

Methodology

Small script called memtest.py:


@profile
def a():
    import pandas

if __name__=="__main__":
    a()

Test it with

$python -m memory_profiler memtest.py

Results:

Output:

The increment of memory usage in line#4 shows, it took 36.527MB to import pandas on my machine. What does your benchmarking result look like?

Leave a Comment

Merge a repo with another as a subfolder

Sometimes we may end up with one main repository and another independently developed repository for a new feature. Later it may turn out, that the independent repo needs to become a part of the main repo as a subfolder.

To do that, we can use the git command using subtree. We need to put the new subfolder name and voila!

git subtree add --prefix=new_subdirectory_name git://github.com/userid/main_repo.git project_branch_name
Leave a Comment

Best way to setup PYTHONPATH for crontab

When setting up a crontab job in Linux machine, these essential steps are required for a successful system operation

  1. Update the cron file by adding the new script on schedule
  2. Check the frequency of the schedule. Such as for running at 7 minutes interval, use
    */7 * * * *   python /path/to/script.py

    Or for running every hour at 7th minute, use

    7  *  *  *  *  python /path/to/script.py
  3. Check the file permission for the script. If it is not executable, then make it executable
    ls -l /path/to/script.py

    Then if the file is not executable by the user add the permission by

    chmod 755 /path/to/script.py

    (you may need to have sudo access if you are not the owner of the file. For changing ownership of a file or folder, use chown command on Linux variants)

  4. Check the file permission for the output site if the script produces some files. If the continuing folder or the output file does not have a write-access, ensure it is writable. Follow the similar process as in step#3.
  5. You may channel any print statements to a file such as
    7 * * * * python /path/to/script.py /path/to/output/file.txt 2&1
  6. Or better yet, use Python logging library to create useful log files and show if the program ran correctly.
Leave a Comment

How to unzip and read gzipped JSON files from URL in Python

The Problem

Sometimes we end up zipping JSON files and putting up somewhere on the interweb. Now we need to read it back from the HTTP server and parse the file using Python. For that situation, let us assume that the zipped JSON is located at this URL:
http://example.com/python_list_turned_into.json.gz. To read this file, we need to do the following

– Fetch the file using urllib2
– Add a header which will be used to detect the gzip format
– Then open the file and read the compressed file
– Next, uncompress the file using gzip library and load it!

That’s it!

The Python Code

Let’s look at the code snippet below, which does exactly what we need.


import urllib2
import json
import gzip
import StringIO

def unzipper(url="http://example.com/python_list_turned_into.json.gz"):

    request = urllib2.Request(url)
    request.add_header('Accept-encoding', 'gzip')
    opener = urllib2.build_opener()
    f = opener.open(request)
    compresseddata = f.read()
    compressedstream = StringIO.StringIO(compresseddata)
    gzipper = gzip.GzipFile(fileobj=compressedstream)
    data = gzipper.read()
    list_output = json.loads(data)
    
    return list_output
Leave a Comment