Summary: Intro | Obtaining the proxy server address | Configuring proxy settings – checklist | Other tools
*Fedora / RHEL / CentOS
Initially, I wanted to title this post as Welcome to Proxy Hell, because – at least at first – getting the proxy settings right on a VM can feel like a nightmare. Especially, if you have no idea about were to start or, more depressingly, when none of your attempts to fix the problem seem to be successful. Nearly inevitably, if working in an office, you have come across proxies. It has become a standard for companies to guard their network traffic with a proxy server. The idea is that the server acts as an intermediary between the private company network and the internet, which both hides the web traffic from the outside eyes and can serve as a base for implementing access authentication and bandwidth control.
Perhaps the first time you consciously acknowledge the presence of a proxy is when your browser’s homepage instead of directing you to Google.co.uk goes to Google.in or Google.pl, and the Search button shows up in a different language than expected. That’s your proxy server location that’s just fooled Google. The second time you come across proxies is less amusing: this is when you start working with, or worse, configuring Virtual Environments and realise that even the basic tasks, like accessing a webpage or installing a package don’t work. For instance, if you followed the Hadoop clustering guide from my last post in an office environment you wouldn’t have been able to get most of it working it without setting up a proxy. Yet, the guide conveniently skips that topic with a vague warning: make sure you’re not behind a proxy. So, what to do if you were?
After having to setup the networking approximately on a million VMs (or it certainly feels like it) and having read an uncountable number of Stack Overflow threads on network error messages, I’ve created a checklist of configuration files that usually need updating before I can start actual work. Although the commands are specific for RHEL/CentOS/Fedora, the high-level procedure is also relevant for other systems, such as Debian or Ubuntu. I normally have it all setup before I start doing anything else on the machine, just to avoid diving into Proxy Hell at a later stage of the project.
You can find your proxy server by running the following command from your Windows Command Line (Mac I don’t know):
It will return the name & the address of your proxy server. You can also ask your IT administrator or consult the intranet website to get it.
The procedure goes like this:
✗ You cannot connect to the internet from your VM.
✔ Configure a network adapter: NAT or Bridged.
✗ You cannot connect to the network from the terminal. Running wget returns ‘Cannot resolve host’ / ‘Time Out’ / ‘Address family not supported by protocol’ error.
✔ This is solved by adding the proxy information to your bashrc file.
Bashrc is a shell script that runs every time a new terminal window is opened. You can put there all commands and environment variables that you want to be available/invoked from terminal. In most environments you have to have root access rights to be able to modify the file.
Open bashrc from the command line:
To be able to access the internet from the command line, add the following variables to bashrc (replacing the with your own):
export http_proxy= http://<proxy-address> export https_proxy= http:/<proxy-address>
Test the connection by opening a new terminal window so that the config in bashrc can take effect.
✗ The browser can’t connect to any website.
✔ Specify that you’re using a proxy in your browser’s network settings. In my Firefox that setting is under Preferences > Advanced > Network. There, tell the browser to auto-detect proxy setting of this network. Confirm the changes and reload the webpage.
Google thinks I’m in the UK now! And I’m not!
If the browser still gets stuck, try connecting to http://wpad/wpad.dat. wpad.dat (where WPAD stands for Web Proxy Auto Discovery Protocol) is a script generated by the proxy server which tells the browser where to direct its internet traffic. If the url returns a script this is good news: the client machine can resolve the IP of the proxy host. Perhaps you need to be more specific with your browser and directly point it to the script: paste the address under Automatic Proxy Configuration URL (in Firefox’s Connection Settings; it will be similar for other browsers) and try again.
✗ Cannot install any packages with yum (package manager). Typical error messages: ‘Cannot find a valid baseurl for repo‘ or ‘Couldn’t connect to host‘.
✔ Specify the proxy URL & the port (usually it’s 80) in yum.conf file.
Save your changes and re-run the yum command. Now yum can get the packages from the web just fine!
Cloudera Manager: you setup a proxy in Administration > Settings > Network
pip (Python’s package manager): append proxy to the install command, e.g.
pip install tensorflow --proxy http://<proxy-address>:<port>
Maven: a build automation tool for Java projects. Add the proxy address in its installation folder, e.g. /opt/apache-maven-3.3.9/conf/settings.xml
Anaconda: in .condarc file under the root directory. If the file doesn’t exist, create it with:
conda config --add channels r Then add the proxy information in the following format:
proxy_servers: http: http://<proxy-address>:<port> https: http://<proxy-address>:<port>
About my relaxed usage of root commands and modifying configuration files: let’s remember, this is a virtual environment, not a production system. None of the above should be considered the safest procedure, or even the most efficient one. The checklist is intended as a guide for a non-sysadmin to get the networking setup out of the way in a fast and easy manner.