[go: up one dir, main page]

Skip to content

Tech evaluation: Object storage using presigned URLs

This is a follow-up tech evaluation from #355 (closed)

@ayufan thanks for your input on slack!(copying here)

  • We need to use pre-signed URLs from GitLab, that way we don’t need any credentials on Pages, and whether the .zip is used can be controlled by Rails exclusively, the link would have an encoded and Rails controlled expiry date
  • If serving from .zip I think we need to likely define the maximum archive size that we can support, likely filtering the relevant files (public/ only folder), and holding that somewhere in memory. I would assume that we could likely configure how many files-in-archives/archives we cache and allow this to be configured and optimised towards cache-hit-ratio, likely GitLab.com would allow to use a ton of memory if needed
  • I would likely break the support for Content-Range if serving files as I don’t think that this is cheaply possible with .zip
  • GitLab Workhorse does have OpenArchive that supports local and remote archive just it is not performance optimised: the HTTP requests are badly aligned and this will likely need to be somehow improved, so just copy-pasting will not give a great performance yet

Diagram/proposal

Screen_Shot_2020-06-26_at_5.09.00_pm

sequenceDiagram
  participant U as User
  participant P as gitlab-pages
  participant G as gitlab-workhorse and rails
  participant OS as Object Storage
  U->>P: 1. username.gitlab.io/index.html
  P->>G: 2. GET /api/v4/internal/pages?host=username.gitlab.io
  G->>P: 3. {... lookup_paths: [{source: {type: "zip", path: "presignedURL"}],...}
  loop zipartifacts
    P->>P: 4. reader:= OpenArchive(presignedURL)
    P->>OS: 5. GET presignedURL
    OS->>P: 6. .zip file
    P->>P: 7. reader.find(public/index.html)
    P->>P: 8. go func(){ cache({host,reader}) } ()
  end
  P->>U: 9. username.gitlab.io/index.html

Proposal

In this PoC we will hardcode the returning value from /api/v4/internal/pages to reduce the scope. I will use minio which is already supported in the GDK. I'll also shamelessly steal and slightly modify the zipartifacts package from workhorse.

To address #377 (comment 367358348) the source type should be "zip" so that Pages can serve from .zip regardless of the path (pre-signed URL or disk path).

Outcomes

We have now &3901 (closed) and &3902 (closed) with parent &1316 (closed) to track all future efforts.

Rails

  1. Allow deploying Pages as .zip archives with a max_archive_size. gitlab#208135 (closed)
  2. On deploy ->check size -> store public.zip either on disk or in object storage depending on the features enabled. also tracked in gitlab#208135 (closed)
  3. Update /api/v4/internal/pages -> return a "source"."type":“zip” with a path gitlab#225840 (closed) e.g.
{
	"lookup_paths" : [
	{
		"source": {
			"type": "zip",
			"path": "https://presigned.url/public.zip",
			"_":"or from disk path"
			"path": "/shared/pages/domain/project/public.zip"
	}
]

Pages (Go)

  1. extract the resolvePath logic from disk serving into its own package so it can be shared. #421 (closed)
  2. Add package zip with zip/reader gitlab#28784 (closed)
  3. Add zip serving to Pages - this allows serving from disk or pre-signed URLs from object storage gitlab#28784 (closed)
  4. Implement a zip reader caching mechanism #422 (closed)
  5. Add metrics for zip serving #423 (closed)
  • while testing I hit #371 so I think it would be valuable to work on that issue first.
Edited by Jaime Martinez