Hacker News new | past | comments | ask | show | jobs | submit login
NeurIPS 2020 Optimization Competition (bbochallenge.com)
77 points by mccourt on July 9, 2020 | hide | past | favorite | 23 comments



Interesting. There's been decades of research on Derivative-Free Optimization (DFO) and stochastic/evolutionary algorithms (most of which are derivative-free). They're used in practical applications, but have been hard to reliably perf benchmark because solution paths are so dependent on initial guess and random chance.

This one focuses on maximizing sample efficiency. That's an interesting (and important) metric to benchmark, especially for functions that are computationally expensive to evaluate, like full-on simulations. Sounds like the algorithm would need to be able to efficiently come up with an accurate surrogate model for the expensive function -- which is hard to do in the general case, but if something is known about the underlying function, some specialization is possible.


Solution To benchmark in general is to run several optimization runs with a random innitialization. Though this has its limitations.

A great optimizer is RBFopt, (python based, free, fast accurate) which is able to do very well and creates a surrogate model while optimizing. My go to optimizer at this point for engineering projects. If anyone knows a better piece of software let me know.


Fascinating! I skimmed the details in the paper here [1] and my impression is that it looks pretty solid. It uses Gutmann's RBFs as surrogate structures. It also benchmarks pretty well against stochastic/metaheuristic algorithms [2].

However most examples in the paper were somewhat small and the problem type is restricted to MINLPs with box constraints (which is still tremendously useful, especially for hyperparameter optimization in simulations).

Just curious, what's the largest problem size you've managed to solve? (no. of constraints, binaries/continuous vars)

[1] http://www.optimization-online.org/DB_FILE/2014/09/4538.pdf

[2] https://www.sciencedirect.com/science/article/pii/S228843001...


Hi paper, is in the works, we designed a bilevel optimization with rbfopt on the upper level and a linear problem on the lower level. Number of constraints are in the 1000s for the linear problem. The inputs to the lower level are 17, and a single output object. We find 400 evaluations of the lower level model for the most important object we will converge to a global minimum. I test ga’s swarm couple of other they didn’t even see the solution found by rbfopt in 800 evaluations.


This blog post lists a bunch of gradient-free optimization packages, some genetic and some Bayesian: https://sigopt.com/blog/comparison-bayesian-packages-hyperpa.... Nothing from the mathematical programming community in here, so definitely other options too (depending on what kind of problems you are trying to address).

Personally, I like Nevergrad (https://facebookresearch.github.io/nevergrad/) a lot for general purpose optimization problems -- I think it is very well implemented and has a variety of tools available. I also think the documentation is appropriately honest about what is and is not known for how these algorithms work in different circumstances.

If you want something Bayesian (sample-efficient) which is very lightweight, I like PySOT (https://pysot.readthedocs.io/en/latest/). Part of why I like it is that it's written by friends of mine, but I also legitimately like its performance across a decent set of problems.

If you want something Bayesian which has corporate support (so that you know it's updated/maintained), I would recommend Botorch/Ax from Facebook (https://botorch.org/docs/botorch_and_ax). They have done a lot of research for it (a recent preprint is here https://arxiv.org/pdf/2006.05078.pdf) and have put together a very solid implementation including considerations for running online optimization problems. I think the documentation is a bit weak, but the software and research is outstanding.

Another corporate-supported option is Optuna (https://optuna.org/) from Preferred Networks. I also know some of the people working on this and I think it is a good implementation of the kernel density estimation strategy for statistical modeling -- preferring lower computational cost and consistent timing over performance. I had difficulties running it in server mode while I was testing, but if you're running locally that will not be a problem.

As is always the case with optimization strategies, there is no one answer. Different tools perform well in different circumstances. There can be bad tools, but, likely, there will never be a best tool (in my estimation).


Thank you! Lots of resources to explore!


NeurIPS, 2020: "We need more sample-efficient algorithms for finding better hyperparameters that specify how to train computationaly expensive deep learning models."

Rich Sutton, 2019: "The biggest lesson that can be read from 70 years of AI research is that general methods that leverage computation are ultimately the most effective, and by a large margin." (https://news.ycombinator.com/item?id=23781400)

I wonder if in the end simply throwing more and more computation at the problem of finding good hyperparameters will end up working better as computation continues to get cheaper and cheaper.


It certainly does look that way for certain classes of problems, as witnessed by the evolution of GPT language models, where the model gets better through sheer use of compute resources.

For many combinatorial problems however, improvements in algorithms can often produce bigger strides than just throwing brute force compute at the problem. Take Mixed Integer Programs (MIPs) -- roughly the optimization-equivalent of SATs -- used for airline scheduling, optimal assignment problems and such. In slide 12 [1] (there are other sources that corroborate), the author notes that MIP solver performance between 1988-2017 had improved 2,527,768,000x.

17,120x was due to machine improvements (single core). 147,650x was due to algorithmic improvements. Multiple cores can also provide a performance boost up to a point, before saturating due to coordination costs. The author notes that "A typical MIP that would have taken 124 years to solve in 1988 will solve in 1 second now".

The biggest improvements in MIP algorithm performance have been due to improvements in solver heuristics (!), because the fastest computations are those that don't have to be performed at all -- i.e. that are eliminated via heuristics.

[1] http://www.focapo-cpc.org/pdf/Linderoth.pdf


> 17,120x was due to machine improvements (single core). 147,650x was due to algorithmic improvements.

That is just... insane. I knew only vaguely that performance had improved significantly for many NP-hard/complete problems in practice, but I did not realize the magnitude of improvement, especially due to better algorithms.

> The biggest improvements in MIP algorithm performance have been due to improvements in solver heuristics (!), because the fastest computations are those that don't have to be performed at all -- i.e. that are eliminated via heuristics.

That is also... remarkable. Thank you for sharing.

I can't help but agree with you :-)

EDIT: Given that most of the "algorithmic" improvements have been due to better solver heuristics, I imagine it should be possible to train meta DL/RL models that learn how to find good heuristics for training DL models with high sample efficiency. Come to think of it, this competition seems to be asking precisely for such "black-box heuristic-guessing" models, so clearly there are people working on it.


Yes, when Bill Bixby (founder of CPLEX and Gurobi, companies that made and still make the fastest MIP solvers in the world) revealed similar numbers a few years ago, many of us practitioners were astounded too.

But those improvements were also a product of lots of smart people funded by cash-rich industries (i.e. oil & gas, airlines... MIPs are big bucks commercially. I recall at one point a commercial CPLEX license was $100k list) poking at the problem for over 30 years. Many Ph.D.s in operations research and mathematical optimization were generated on this topic alone.


Makes sense.

Note that now we have lots of smart people funded by new cash-rich industries (search, social networks, SaaS, etc.) poking at the problem with "black-box" approaches. I will be interesting to see what comes out of it.


Great example of the fact that different circumstances require different approaches, and the fact that brute force is increasingly impractical for combinatorially complex problems. Thanks for this reference.


Agree. For classes of problems that human have poor understandings (e.g biology) even better algorithms can not provide better answers. The optimisation might need to happen elsewhere such as re-phrasing the problem or better understanding of related fields (neighbour area). However sample efficiency is indeed very helpful in data deprived area such as biomedical research.


This sounds like the reason Bertsimas at MIT published his new book.


A very reasonable point and, certainly, the direction that parts of the computational community have embraced over the years. I will use integration as an example: classic computational methods were focused on trying to make strong assumptions about the integrand and significantly reduce the number of integrand evaluations (Gauss quadrature is the main thing that comes to mind). As computation became more accessible/parallelizable, and problems became less analytic, Monte Carlo methods have become more fundamental.

In some distributed computational settings, memory traffic is actually the main bottleneck and redundant computations are executed to reduce the need to send data (a similar situation to the one you aptly describe).

I think that, in the case of hyperparameter/meta-learning optimization (or search, depending on how you think about it) we are at a time right now where the complexity of models which can effectively be put into production is a function of our ability to, at least partially, analyze the space of possible modeling decisions. Will we escape that, and have models whose training cost is less significant than the cost of executing an "intelligent" hyperparameter search process? Maybe ... I am a GP person so I see potential in clever analysis of circumstances so that RKHS methods (for instance) can be leveraged and simplify the training process. But the current trajectory of the community has been to work on increasingly expensive models, which makes the ability to effectively use them with limited tuning/search cost still relevant.


Yes. Upon further reflection, this competition actually looks like another step in that direction (see this thread: https://news.ycombinator.com/item?id=23784239).

Otherwise I agree, Gaussian Processes are nice and friendly, and work quite well for low-dimensional search (e.g., from a few to hundreds of hyperparameters) under very natural, general assumptions :-)


to do level-2 optimization you still have to run the level-1 problem repeatedly. So you'll only ever spend a small fraction of your resources on meta-optimization. The levels become exponentially more costly.


if anyone wants to do this. i have a threadripper build with 2 2080ti. would be cool to do a group project. write pm if you want. i am located in amsterdam, europe


I don't want to participate, but you're not going to find anyone as it stands. There are no PMs on HN, and you have no contact information in your profile (accounts' email addresses are not public).


oops. thx


seems i forgot my contact details. hereby: markus @ life - electronic.nl

without the spaces


Search is the problem that solves all other problems.


back to square 1 :-)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: