Chunked file uploader
Dear myself, here in 2023 we have neither flying cars nor hoverboards. And what's the most ridiculous, transferring files between devices is still a problem. I hope you managed to read this and created a better timeline for yourself.
But as long as I'm stuck in such a presence, I have to solve this problem once again. MTP stopped working for some reason and I was unable to move files from my android phone. I thought why not to add an uploader to this site, that would be a nice and useful feature. And when I started working on it, someone came to the DNG, a Devuan mailing list, with the same problem. Damned phones, that's how they name it. Funny, eh? I'm not alone.
Given that file upload is inherently shitty in HTTP, I rejected the basic approach from the beginning. The only option was a chunked uploader which implies some code on the client side. Although I'm a fan of tiny web sites, I'm not an ascetic, so not against some code. I'm just not a fan of Javascript. I know it from its infancy and clearly remember how ugly it was. Later on jQuery made things better and at work I used some uploader plugins, without digging into details just because of time constraints. But now, for myself, I wanted to pass the way from the very beginning. Nowadays everything is standardised and there's no need for jQuery at all yet since circa 2015, that's great. Nevertheless, I have stepped on a couple of poops.
Okay, enough rants, here's the core of the uploader:
(() => {
class FileUploader
{
constructor(settings)
{
const default_settings = {
url: '/',
chunk_size: 512 * 1024, // the last chunk can be up to 1.5x larger,
// mind the server request body size limit
// (e.g. for NGINX the default value is 1M)
file_name_header: 'File-Name' // something standard like Content-Disposition
// might look better, but needs more work on parsing
};
this.settings = Object.assign({}, default_settings, settings);
this.upload_queue = [];
}
upload(file, params)
/*
* Add file to the queue and if no upload is in progress, start it.
*/
{
const start_upload = this.upload_queue.length == 0;
// create file item and insert it to the beginning of the queue
const file_item = new FileItem(this, file, params);
this.upload_queue.push(file_item);
if(start_upload) {
// calling async function without await will simply return a promise,
// either already fulfilled or pending; we don't need it anyway
this._async_upload_files().then();
}
}
progress(file, params, chunk_start_percentage, chunk_end_percentage, percentage)
/*
* Called on upload progress.
* Override in subclass as necessary.
*/
{
}
async upload_complete(file, params)
/*
* Called when upload complete.
* Override in subclass as necessary.
*/
{
}
async _async_upload_files()
{
// process the queue
while(this.upload_queue.length != 0) {
await this.upload_queue[0].upload();
this.upload_queue.shift();
}
}
}
class FileItem
/*
* An item for the upload queue.
*/
{
constructor(uploader, file, params)
{
this.uploader = uploader;
this.file = file;
this.params = params;
}
async upload()
{
var chunk_start = 0;
var chunk_size;
while(chunk_start < this.file.size) {
const remaining_size = this.file.size - chunk_start;
// upload in fixed-size chunks, the last chunk can be up to 1.5 x default_chunk_size
if(remaining_size < 1.5 * this.uploader.settings.chunk_size) {
chunk_size = remaining_size;
} else {
chunk_size = this.uploader.settings.chunk_size;
}
const chunk = this.file.slice(chunk_start, chunk_start + chunk_size);
// XXX hack, save start, end in chunk object
chunk.start = chunk_start;
chunk.end = chunk_start + chunk_size;
while(true) {
try {
await this._upload_chunk(chunk);
break;
} catch(error) {
console.log(`${this.file.name} upload error, retry in 5 seconds`);
await new Promise(resolve => setTimeout(resolve, 5000));
}
}
chunk_start += chunk_size;
}
await this.uploader.upload_complete(this.file, this.params);
}
_upload_chunk(chunk)
{
// Although this function is not async, it is called with await.
// It uses non-awaitable XMLHttpRequest so it can't be declared as async.
// So, it returns a promise.
const self = this;
return new Promise((resolve, reject) => {
const reader = new FileReader();
const xhr = new XMLHttpRequest();
xhr.upload.addEventListener(
"progress",
(e) => {
if(e.lengthComputable) {
const percentage = Math.round((e.loaded * 100) / e.total);
self._update_progress(chunk, percentage);
}
},
false
);
xhr.onreadystatechange = () => {
if(xhr.readyState === xhr.DONE) {
if(xhr.status === 200) {
self._update_progress(chunk, 100);
resolve(xhr.response);
} else {
reject({
status: xhr.status,
statusText: xhr.statusText
});
}
}
};
xhr.onerror = () => {
reject({
status: xhr.status,
statusText: xhr.statusText
});
};
xhr.open('POST', this.uploader.settings.url);
const content_range = `bytes ${chunk.start}-${chunk.end - 1}/${this.file.size}`;
xhr.setRequestHeader("Content-Range", content_range);
xhr.setRequestHeader("Content-Type", "application/octet-stream");
xhr.setRequestHeader(this.uploader.settings.file_name_header, this.file.name);
reader.onload = (e) => {
xhr.send(e.target.result);
};
reader.readAsArrayBuffer(chunk);
self._update_progress(chunk, 0);
});
}
_update_progress(chunk, percentage)
{
// calculate percentages and call progress method
const chunk_start_percentage = chunk.start * 100 / this.file.size;
const chunk_end_percentage = chunk.end * 100 / this.file.size;
const upload_percentage = chunk_start_percentage + chunk.size * percentage / this.file.size;
this.uploader.progress(
this.file,
this.params,
chunk_start_percentage.toFixed(2),
chunk_end_percentage.toFixed(2),
upload_percentage.toFixed(2)
);
}
}
// "export" FileUploader
window.FileUploader = FileUploader;
})();
The code above is a simplified version of https://declassed.art/repository/declassed.art/file/cc539a03ed26/embedded/js/uploader.js. I dropped some pieces to make it as simple as possible. Same for HTML and the rest of the code:
<h3>Upload Files</h3>
<p>
<button id="file-select">Choose Files</button> or drag and drop to the table below
</p>
<table id="file-list">
<thead>
<tr><th>File name</th><th>Size</th></tr>
</thead>
<tbody>
</tbody>
</table>
<template id="file-row">
<tr><td></td><td></td></tr>
</template>
<input type="file" id="files-input" multiple style="display:none">
<script>
const upload_complete_color = 'rgb(0,192,0,0.2)';
const chunk_complete_color = 'rgb(0,255,0,0.1)';
class Uploader extends FileUploader
{
constructor()
{
super({url: '/api/feedback/upload'});
this.elem = {
file_select: document.getElementById("file-select"),
files_input: document.getElementById("files-input"),
file_list: document.getElementById("file-list"),
row_template: document.getElementById('file-row')
};
this.elem.tbody = this.elem.file_list.getElementsByTagName('tbody')[0];
this.row_index = 0;
this.set_event_handlers();
}
set_event_handlers()
{
const self = this;
this.elem.file_select.addEventListener(
"click",
() => { self.elem.files_input.click(); },
false
);
this.elem.files_input.addEventListener(
"change",
() => { self.handle_files(self.elem.files_input.files) },
false
);
function consume_event(e)
{
e.stopPropagation();
e.preventDefault();
}
function drop(e)
{
consume_event(e);
self.handle_files(e.dataTransfer.files);
}
this.elem.file_list.addEventListener("dragenter", consume_event, false);
this.elem.file_list.addEventListener("dragover", consume_event, false);
this.elem.file_list.addEventListener("drop", drop, false);
}
progress(file, params, chunk_start_percentage, chunk_end_percentage, percentage)
{
params.progress_container.style.background = 'linear-gradient(to right, '
+ `${upload_complete_color} 0 ${percentage}%, `
+ `${chunk_complete_color} ${percentage}% ${chunk_end_percentage}%, `
+ `transparent ${chunk_end_percentage}%)`;
}
async upload_complete(file, params)
{
// make entire row green
params.progress_container.style.background = upload_complete_color;
params.progress_container.nextSibling.style.background = upload_complete_color;
}
handle_files(files)
/*
* handle files coming from either drag'n'drop or file selection dialog
*/
{
for(const file of files) {
const cols = this.append_file(file.size);
this.upload(file, {progress_container: cols[0]});
}
}
append_file(size)
/*
* Append file to the table, format and display file size.
* Return list of cells.
*/
{
const rows = this.elem.tbody.getElementsByTagName("tr");
var row;
if(this.row_index >= rows.length) {
row = this.append_row();
} else {
row = rows[this.row_index];
}
this.row_index++;
const cols = row.getElementsByTagName("td");
cols[1].textContent = size.toString();
return cols;
}
append_row()
/*
* Append empty row to the table.
* Return row element.
*/
{
const tbody = this.elem.file_list.getElementsByTagName('tbody')[0];
const row = this.elem.row_template.content.firstElementChild.cloneNode(true);
tbody.appendChild(row);
return row;
}
const uploader = new Uploader();
// add initial empty rows to the table
for(let i = 0; i < 5; i++) uploader.append_row();
</script>
You can find the full code here https://declassed.art/repository/declassed.art/file/cc539a03ed26/content/feedback.yaml. Yes, it's a weird mixture of HTML, Jsvascript, and CSS inside YAML, but that's my favorite toy project. Always loved blending different languages.
Note that the original version tracks file names to avoid duplicates. Also, it has a concept of folders, so multiple uploaders do not interfere each other. But this is where the complication starts and a particular implementation is up to the developer.
Finally, a server-side handler:
import os.path
import re
from starlette.responses import Response
import aiofiles.os
# As of time of writing aiofiles.os had no such wrappers:
aiofiles.os.open = aiofiles.os.wrap(os.open)
aiofiles.os.close = aiofiles.os.wrap(os.close)
aiofiles.os.lseek = aiofiles.os.wrap(os.lseek)
aiofiles.os.write = aiofiles.os.wrap(os.write)
re_content_range = re.compile(r'bytes\s+(\d+)-(\d+)/(\d+)')
@expose(methods='POST')
async def upload(self, request):
'''
Upload chunk of file.
'''
data = await request.body()
filename = os.path.basename(request.headers['File-Name'])
start, end, size = [int(n) for n in re_content_range.search(request.headers['Content-Range']).groups()]
fd = await aiofiles.os.open(filename, os.O_CREAT | os.O_RDWR, mode=0o666)
try:
await aiofiles.os.lseek(fd, start, os.SEEK_SET)
await aiofiles.os.write(fd, data)
finally:
await aiofiles.os.close(fd)
return Response()
As all the code above, it's a simplified version. Here's the complete one: https://declassed.art/repository/declassed.art/file/cc539a03ed26/api/declassed_api.py. For files I prefer function that use file descriptors because there's no way to specify os.O_CREAT | os.O_RDWR with higher-level functions.
So what were poops on my way? Well, the first one was modules. I think it's a cool feature of Javascript, but I embed the code (this eliminates the need for versioning), so the module becomes anonymous and it's impossible to export anything. Such modules are quite useless and for namespace isolation I had to use old school closure and set window.FileUploader. Yuck!
The second poop is not directly related to this uploader, but it's worth mentioning as long as this uploader was a part bigger work. It's the lack of synchronization primitives. No semaphores, no mutexes, how do you suppose to serialize execution of multiple tasks if only one of them should get something from an API and cache for all others? Yuck again.
Comments