Troubleshooting self-hosted GitLab - root user missing
I’ve been going through a a couple of projects from Predrag Mijatovic’s realistic and useful course on devops. Once in a while, I bump into some issues - many have the tools are changing over time.
On the section of installing and setting up GitLab (on a VPS), I ran into a couple of problems.
These are the versions I’m using:
1
2
3
4
5
6
7
---------------------------------------------------------------------
Ruby: ruby 3.0.6p216 (2023-03-30 revision 23a532679b) [x86_64-linux]
GitLab: 16.3.0 (85a896db163) FOSS
GitLab Shell: 14.26.0
PostgreSQL: 13.11
-------------------------------------------------[ booted in 24.69s ]
Loading production environment (Rails 7.0.6)
These notes might be useful hints in the future.
I assume that you’re familiar with these tools. Or atleast, going through that course.
TL;DR
- Ensure the machine has enough RAM.
- Ensure that the initial root password adheres to GitLab’s password requirements: long enough and doesn’t contain “devops” or “gitlab”. Otherwise,
root
user is not created, meaning you can’t log into the web ui nor reset its password.
Problem 1 - GitLab not displayed in Traefik because of low RAM
I ran the GitLab ansible role on a machine with low memory. I got this error:
1
2
3
4
5
root@vps ~$> cd services/gitlab
root@vps services/gitlab $> docker-compose logs | grep error
# output
gitlab.example.com | [2023-08-26T08:02:10+00:00] FATAL: Mixlib::ShellOut::ShellCommandFailed: rails_migration[gitlab-rails] (gitlab::database_migrations line 51) had an error: Mixlib::ShellOut::ShellCommandFailed: bash_hide_env[migrate gitlab-rails database] (gitlab::database_migrations line 20) had an error: Mixlib::ShellOut::ShellCommandFailed: Expected process to exit with [0], but received '137'
Expected process to exit with [0], but received '137'
Adding more RAM, as suggested in a github issue, by resizing the Linode and re-runing the role, solved that part.
Problem 2 - GitLab not displayed in Traefik because of already existing schemas
On restart, I didn’t see the service in traefik. Peeked at the logs, only to be greeted with:
1
2
3
4
5
6
7
8
9
root@vps ~$> cd services/gitlab
root@vps services/gitlab $> docker-compose logs | grep error
gitlab.example.com | [2023-08-26T09:02:10+00:00] FATAL: Mixlib::ShellOut::ShellCommandFailed: rails_migration[gitlab-rails] (gitlab::database_migrations line 51) had an error: Mixlib::ShellOut::ShellCommandFailed: bash_hide_env[migrate gitlab-rails database] (gitlab::database_migrations line 20) had an error: Mixlib::ShellOut::ShellCommandFailed: Expected process to exit with [0], but received '1'
gitlab.example.com | ---- Begin output of "bash" ----
gitlab.example.com | STDOUT: rake aborted!
gitlab.example.com | StandardError: An error has occurred, this and all later migrations canceled:
gitlab.example.com |
gitlab.example.com |
gitlab.example.com | PG::DuplicateSchema: ERROR: schema "gitlab_partitions_dynamic" already exists
Does that schema really exist?
Let’s check. GitLab’s troubleshooting docs give us some hints:
1
2
3
4
5
6
7
8
9
10
11
12
13
[root@vps ~]$> docker exec -it gitlab.example.com /bin/bash
[root@gitlab ~]$> gitlab-psql -d gitlabhq_production
# output
psql (13.11)
Type "help" for help.
gitlabhq_production=> \dn
List of schemas
Name | Owner
---------------------------+-------------
gitlab_partitions_dynamic | gitlab
gitlab_partitions_static | gitlab
public | gitlab-psql
(3 rows)
Ok, that schema exists.
I’m guessing that, after the initial failure, the docker compose resources still exist. And when ansible runs, it reuses them instead of creating new ones. What came to mind was to delete the container, and its volumes.
On the VPS, running docker-compose down --volumes gitlab
didn’t work.
Came across a stackoverflow answer, that the volumes won’t disappear until mount volumes are deleted. I chose to delete them from the host.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
# see if docker compose service is running
root@vps ~$> docker compose ls
# output
NAME STATUS CONFIG FILES
gitlab running(1) /root/services/gitlab/docker-compose.yml
# stop it
root@vps ~$> docker compose stop gitlab
# show the mount points? Thanks to https://stackoverflow.com/a/30133768
root@vps ~$> docker inspect -f '{{ .Mounts }}' $gitlab_container_id
# output is pretty printed
[
{bind /root/services/gitlab/data /var/opt/gitlab rw true rprivate }
{bind /mnt/storage/backups/gitlab/secrets /secret/gitlab/backups rw true rprivate}
{bind /root/services/gitlab/config /etc/gitlab rw true rprivate}
{bind /root/services/gitlab/logs /var/log/gitlab rw true rprivate}
]
# manually delete the mount points on the host
root@vps ~$> rm -r /root/services/gitlab/data /mnt/storage/backups/gitlab/secrets /root/services/gitlab/config /root/services/gitlab/logs
root@vps$> docker-compose down --volumes gitlab
# list running compose projects - gitlab should be absent
root@vps$> docker-compose ls
# re-run ansible playbook
Re-run the ansible playbook. Now gitlab shows up in traefik. Yay!! 🥳
Problem 3 - I can’t log into the Web UI as root
Now, GitLab’s sign in UI is shown. Wohoo!!
But logging in as root doesn’t work. 😞
Let’s check the logs. Maybe Postgres is becoming the gift that won’t stop giving.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
root@vps ~$> cd services/gitlab
root@vps ~$> less /var/log/gitlab/postgresql/current
2023-08-26_11:55:11.53543 LOG: database system is ready to accept connections
2023-08-26_11:55:15.07712 LOG: no match in usermap "gitlab" for user "gitlab" authenticated as "root"
2023-08-26_11:55:15.07717 FATAL: Peer authentication failed for user "gitlab"
2023-08-26_11:55:15.07717 DETAIL: Connection matched pg_hba.conf line 70: "local all all peer map=gitlab"
2023-08-26_11:55:15.17944 LOG: no match in usermap "gitlab" for user "gitlab" authenticated as "root"
2023-08-26_11:55:15.17947 FATAL: Peer authentication failed for user "gitlab"
2023-08-26_11:55:15.17947 DETAIL: Connection matched pg_hba.conf line 70: "local all all peer map=gitlab"
2023-08-26_11:55:15.27469 LOG: no match in usermap "gitlab" for user "gitlab" authenticated as "root"
2023-08-26_11:55:15.27471 FATAL: Peer authentication failed for user "gitlab"
2023-08-26_11:55:15.27472 DETAIL: Connection matched pg_hba.conf line 70: "local all all peer map=gitlab"
## connect to db and check
Does the root user exist ? Connect to GitLab (thanks stackoverflow)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
[root@vps ~]$> docker exec -it gitlab.example.com /bin/bash
[root@gitlab ~]$> gitlab-rails console -e production
# output
-----------------------------------------------------------------------------
Ruby: ruby 3.0.6p216 (2023-03-30 revision 23a532679b) [x86_64-linux]
GitLab: 16.3.0 (85a896db163) FOSS
GitLab Shell: 14.26.0
PostgreSQL: 13.11
---------------------------------------------------------[ booted in 24.69s ]
Loading production environment (Rails 7.0.6)
irb(main):002:0> user = User.where(id: 1)
=> null
irb(main):003:0> \q
# back into gitlab container
[root@gitlab ~]$>
Using the gitlab-rake
command (stackoverflow answer) also returned null
1
2
3
# back into gitlab container
[root@gitlab ~]$> gitlab-rake "gitlab:password:reset[root]"
Unable to find user with username root
Any reason why there’s no root
user? Yep.
When the password of the root user doesn’t match the complexity requirement, then the root
user is not created. I read that in a couple of GitLab issues such as this one and this and Stackoverflow
Hmmm. What are these requirements, and where can I find them?
These requirements are stated in the GitLab docs
Looking at my initial password, it’s length satisfied these constraints. But failed on the weak password check: it contained the predictable work “devops
” 😅
Anyway, back to the Password Manager to generate another one. Then:
- Update the
.env
. - Re-ran the ansible playbook.
- Go into the gitlab container and reset the password
1
2
3
4
5
6
7
[root@vps ~]$> docker exec -it gitlab.example.com /bin/bash
root@gitlab:/$> gitlab-rake "gitlab:password:reset[root]"
Enter password:
Confirm password:
Password successfully updated for user with username root.
root@gitlab:/$>exit
[root@vps ~]$>
Go back to the UI, and log in with the root's
new password.