Django Admin: Creating a csv uploader to seed the database

By Ethan Shearer June 18, 2019

Post title

Introduction

Sometimes, when building a Django application, we come across interesting data we want to use in our project. However, this data is not always available in a nice format such as an API or SQL file. If this data currently resides in a bunch of csv files, some of which can be over 100,000 lines long, our task is to create functionality in the Django admin that allows us to upload csv files to the database.

The project we are building needs to cater for the following:

  • Large files shouldn’t take a ridiculous amount of time to upload
  • The function should handle integrity errors without crashing our application
  • The admin page should have a file upload form
  • The upload form should work on all models to ensure our application remains DRY (Dont Repeat Yourself)

Extending the Admin View

To begin, let’s create the form that will be responsible for uploading our files:

1{% extends 'admin/change_list.html' %}
2{% block content %}
3    {{ block.super }}
4    <form action="upload-csv/" method="post" enctype="multipart/form-data">
5        {% csrf_token %}
6        {{ csv_upload_form }}
7        <input type="submit" value="Upload">
8    </form>
9{% endblock %}

Next, we need to extend the ModelAdmin class to include our new form. In admin.py, below where we defined our CsvUploadForm, create the following class:

 1class CsvUploadAdmin(admin.ModelAdmin):
 2
 3    change_list_template = "custom_admin/csv_form.html"
 4
 5    def get_urls(self):
 6        urls = super().get_urls()
 7        additional_urls = [
 8            path("upload-csv/", self.upload_csv),
 9        ]
10        return additional_urls + urls
11
12    def changelist_view(self, request, extra_context=None):
13        extra = extra_context or {}
14        extra["csv_upload_form"] = CsvUploadForm()
15        return super(CsvUploadAdmin, self).changelist_view(request, extra_context=extra)
16
17    def upload_csv(self, request):
18        if request.method == "POST":
19
20            # Here we will process the csv file  
21
22        return redirect("..")

At this stage we need to define our new template, so the form will appear under our list of models.

The method get_urls() first calls super, then adds a new url to the list. This url will handle our form.

We then need to add some extra context to our admin view. In the method changelist_view(), we add our form class to the context. Finally, we create our method to handle the file. This method will be called when a post request is sent to the url upload-csv/

To use the new form in our admin class, we simply inherit from our new base class:

1@admin.register(SomeModel)
2class SomeModelAdmin(CsvUploadAdmin):
3    pass

Saving the CSV to the database

Now let’s add some functionality to our new form. Create a new class to handle the csv file and save it to the database. Then create an instance of the class and call its create_records() method. But before we do this let’s do some basic validation on the file:

 1def upload_csv(self, request):
 2    if request.method == "POST":
 3        form = CsvUploadForm(request.POST, request.FILES)
 4        if form.is_valid():
 5            if request.FILES['csv_file'].name.endswith('csv'):
 6
 7                try:
 8                    decoded_file = request.FILES['csv_file'].read().decode('utf-8')
 9                except UnicodeDecodeError as e:
10                    self.message_user(
11                        request,
12                        "There was an error decoding the file:{}".format(e),
13                        level=messages.ERROR
14                    )
15                    return redirect("..")
16
17				# Here we will call our class method
18				io_string = io.StringIO(decoded_file)
19                uploader = CsvUploader(io_string, self.model)
20				result = uploader.create_records()
21
22           else:
23               self.message_user(
24               request,
25               "Incorrect file type: {}".format(
26                   request.FILES['csv_file'].name.split(".")[1]
27                   ),
28               level=messages.ERROR
29               )
30
31    else:
32        self.message_user(
33            request,
34            "There was an error in the form {}".format(form.errors),
35            level=messages.ERROR
36        )
37
38return redirect("..")

You may wish to go into more detail with the validation. As I am only using this to seed the database, the above was enough.

Now let’s create the class that will handle saving our data. Create a new file called csv_uploader.py. In this file we first need to import the Python CSV module and some other modules from Django. We will be using these later.

 1import csv
 2
 3from django.db.utils import IntegrityError
 4from django.core.exceptions import FieldDoesNotExist
 5from django.db import transaction
 6
 7class CsvUploader:
 8
 9    def __init__(self, csv_file, model_name, app_name):
10        self.reader = list(csv.DictReader(csv_file, delimiter=','))
11        self.keys = [k for k in self.reader[0]]
12        self.model_fields = [f.name for f in  self.model._meta.get_fields()]
13        self.valid = self._validate_csv()
14        self.csv_pos = 0
15
16    def _validate_csv(self):
17
18        keys = []
19        for k in self.keys:
20            if k.endswith("_id"):
21                keys.append(k[:-3])
22            else:
23                keys.append(k)
24        return set(keys).issubset(self.model_fields)

In the __init__() method I pass the csv_file and the model we will be saving to. I then set some properties which will be helpful later on. The validate_csv() method checks that our csv column headings match our database structure. The csv_pos attribute keeps a record of how far through the CSV file we have read.

Now let’s save it to our database! We will use Django’s bulk create method to reduce the number of database queries. We don’t want our bulk create to be too big, as there are limits to the size of database queries. Let’s break our data up into manageable chunks. The following method should do the trick:

 1def read_chunk(self):
 2    chunk = []
 3    for i in range(1000):
 4        try:
 5            chunk.append(self.model(**self.reader[self.csv_pos]))
 6        except IndexError as e:
 7            print(e)
 8            break
 9        self.csv_pos += 1
10    return chunk

The final piece of the puzzle is to execute our queries. We first attempt to bulk_create each chunk of 1000. If that fails, we fall back to saving them one by one:

 1def create_records(self):
 2
 3    if not self.valid:
 4        return "Invalid csv file"
 5
 6    while True:
 7        chunk = self.read_chunk()
 8
 9        if not chunk:
10            break
11
12        try:
13            with transaction.atomic():
14                self.model.objects.bulk_create(chunk)
15        except IntegrityError as e:
16            for i in chunk:
17                try:
18                    i.save()
19                except IntegrityError:
20                    continue
21            print("Exeption: {}".format(e))
22
23      return "records succesfully saved!"

If the bulk_create function raises an error, such as a duplicate primary key, we roll Django back. Then continue with executing our queries one by one. To do that we need to use transaction.atomic().