Generally when you use a waveshaper (hard-clip or soft-clip) with some kind of gain, even moderate, you have to reduce the output volume, because the power of something tending to a full-range quasi-rectangle signal is generally much larger than the input power and requires compensation. This can also be viewed as reducing the scale of the shape when increasing the input gain, but having the 3 distinct operations (pre-gain, shape, post-gain) is usually easier to implement in the general case.
Polynomial waveshapers require hard-clipping beforehand too, unless you’re sure the input signal stays within a specific range so the output doesn’t fold back and grows at the power of N.
This family of polyshapers have their first derivatives reaching 0 at the end points, lowering the requirement to band-limit the hard-clipping part (thanks to Andrew Simper for the coefficients). They also have a unity gain at 0 and their min and max at ±1.