The proposed procedure is listed as pseudocode in Algorithm 3.1. Search starts from a user-supplied initial guess solution, v, that is feasible. From the initial solution
y=v, each yi is moved to be closer to its corresponding unsharpened (target) data
point xi. Every such move will reduce the Lα(y,x) objective function, but no point
may be moved in a way that causes constraint violations. In this way feasibility is guaranteed throughout. The algorithm cycles through the elements of y for as long as feasibility-preserving improvements can be made. The procedure is greedy in the sense that each sharpened data point is moved individually to improve the objective function as much as possible, without consideration of how the current move will impact future moves of other points.
Step one in the search is initialization. The original data xis sorted in ascending order, and the solution is initialized toy=v. The initial solution may be a simplistic choice, but it must satisfy the constraint. If the kernel function itself satisfies the operative shape constraints, an easy way to initialize is to let v have all of its data points at the same location. This will cause the KDE to have the same shape as the kernel function. When using this initialization strategy, the default choice for the kernels’ location is the location of the highest mode in the unconstrained estimate. In
other words, if m0 is the location of the highest mode, we setv=m01. This starting solution has been found to perform adequately in most circumstances.
The second step is to prepare for moving the yi. The target values xi,i= 1, . . . , n
will also be called thehome positions for their correspondingyi values. The solution
is improved during the algorithm by moving each yi toward home. If a point reaches
home, it stops moving. If the constraint prevents a point from moving closer to home, that point is said to be pinned.
In preparation for moving the points, y is first sorted, to produce a sensible matching to x. After this, each point is examined to determine whether or not it is
moveable. A point is considered moveable if it is neither pinned nor at home. The total number of moveable points is M. The algorithm terminates when M = 0; at this point no further moves can be made without either worsening the solution or violating the constraint.
Step three in the algorithm is the core of the method—a sweep or pass through allM moveable points iny. In each pass, every moveable point is moved closer to its target position, or left in place if no feasible move is found. The movement of each point is done by grid search over the interval [yi, xi]. Grid search is performed by
dividing the search interval into S steps. If any moves are made in a pass, S is left unmodified and another pass begins after re-sortingyand re-counting the number of moveable points. If a complete sweep results in no moved points, the value of S is doubled before the next pass, permitting smaller moves to be made on a finer grid.
An important feature of the algorithm is that S is initialized to 1. This means that during the first sequence of passes through the data, there is an attempt to move points all the way home directly in one step. Doing so saves computation time since in many cases a large portion of the points can move home immediately without violating the constraint. By successively doublingS only when moves cannot be made, more thorough searches are deferred until the later stages, when a small number of points are being moved up against the constraint boundary. This strategy reduces the greediness of the method, preventing points from becoming pinned too soon and thereby conferring a considerable performance improvement.
Algorithm 3.1: A greedy data sharpening algorithm (improve).
Input: A feasible initial guess, v; the data, x; a bandwidth, h
Output: A feasible solution y with Lα(y,x)≤Lα(v,x)
Initialize Set y←v.
Let S be the number of grid search steps. Set S ←1. Prepare for the first sweep
Sort y.
Find the set of moveable points (M of them).
while M >0
Sweep through the points
for each moveable point
Use grid search with S steps to move the point closer to home, while maintaining feasibility.
Prepare for the next sweep
if at least one point has moved Sort y.
Find the set of moveable points (M of them)
else
Set S←2S
Note also that the sorting step is performed before every pass through the data. Re-sorting the points at each step improves the performance of the algorithm because sometimes points cross over one another, in which case both will be closer to home, and the objective function will be decreased, if they switch target points.
These ideas are illustrated in Figure 3.1, which shows how the solution develops over three passes for a small example with only five data points. The constraint in this example is unimodality. The intermediate positions of the sharpened points are shown after each pass, and a line joins each point to its target. Each line is labeled to show the status of its corresponding sharpened point. Lines labeled with numbers correspond to moveable points, and the numbers indicate the order in which points are to be moved. Lines labeled with h correspond to points at home, while those labeled with p correspond to points that are pinned. After the first pass (the upper right plot in the figure), the sorting step has caused two points to switch targets. The thick grey lines indicate the points’ new targets after re-matching. In this example
1 2 4 5 3 Start 2 3 h 4 1 After 1 pass h p h p 1 After 2 passes h p h p p After 3 passes
Figure 3.1: A small example illustrating the greedy sharpening method. Solid/dashed lines show the sharpened/unsharpened estimates. Open/filled circles show the sharp- ened/unsharpened data. Grey lines join each unsharpened point to its target and indicate the status of the point.
the search terminates after three passes, with three points pinned and two points at home.