How we went from theory to gates in optimizing the frame synchronization for Opulent Voice.
Finding the beginning of a data frame in a noisy radio channel is like searching for a needle in a haystack. Except, the haystack is constantly shifting, and some of the hay looks suspiciously like needles. For the Opulent Voice digital voice protocol, we’ve tackled this physics challenge on two fronts. First, we mathematically optimized our synchronization word. Second, we upgraded our FPGA detector from hard-decision matching to soft-decision correlation in order to take advantage of the better mathematics of the optimized codeword.
When we first designed the frame structure for Opulent Voice, we experimented with a familiar tool used in legacy digital voice protocols, called Barker codes. These binary sequences, discovered by R.H. Barker in 1953, have near-perfect autocorrelation properties. This means that when you slide them past themselves, you get a sharp peak at alignment and minimal response elsewhere. The problem? The longest Barker code is only 13 bits, and we needed 24 bits.
The textbook solution is concatenation. Lucky for us, we could stick an 11-bit Barker code together with a 13-bit Barker code, and get 24-bits output. This gives you 0xE25F35, with a Peak Sidelobe-to-Mainlobe Ratio (PSLR) of 3:1. Respectable, but we realized that this wasn’t necessarily optimal for 24 bits.
The answer required brute force. With 2^24 = 16,777,216 possible sequences, modern computers can exhaustively search the entire space in about 90 seconds. The results were illuminating. 6,864 sequences achieve the optimal PSLR of 8:1. This was nearly three times better than our concatenated Barker code.
We can think of this like antenna directivity. A PSLR of 8:1 means our “main lobe” (the correlation peak when perfectly aligned) is eight times stronger than any “sidelobe” (responses at other alignments). Higher PSLR translates directly to better false-detection rejection, especially in multipath environments where delayed signal copies can trigger spurious sync detections.
For Opulent Voice, we selected 0x02B8DB from the optimal set. Besides having the best possible PSLR, it has good DC balance (11 ones, 13 zeros) and a maximum run length of 6 zeros. This woulld be important for tracking loop stability in minimum shift keying modulation. The mnemonic is “oh to be eight dB” for the hex digits.
Having an optimal sync word is only half the battle. The detector implementation matters just as much.
Our original frame sync detector used Hamming distance. We counted up how many bits differed between the received pattern when compared to our known sync word. If fewer than some threshold differ, we declared that sync was found. This works fine for strong signals, but there’s a fundamental problem buried lower in the noise. By the time bits reach the detector, the minimum shift keying demodulator has already made “hard decisions”. That means that each symbol has been quantized to a definitive 0 or 1.
Hard decisions throw away valuable information. The demodulator might have been 99% confident about one bit but only 51% confident about another, yet both become equally weighted in the Hamming distance calculation. In a D&D analogy, it’s like reducing your attack rolls to just “hit” or “miss” without tracking the actual roll. You lose the ability to distinguish a near-miss from a catastrophic fumble. Now, if all we had to work with was hard decisions, then this is the best we could do. But there’s something really neat about our demodulator. It also decodes how confident is is of that 1 or 0. The soft decision metric is already produced and already available as a demodulator output.
The solution to our sync word detection optimizaiton problem is to use soft-decision correlation. Instead of binary bits, we work with signed values that indicate both the decision and the confidence. A value of +7 means “definitely a 0 with high confidence,” while +1 means “probably a 0 but not very sure.” Negative values indicate 1s.
The math is elegant. For each of the 24 sync word positions, we multiply the soft sample by +1 if we expect a ‘0’ or -1 if we expect a ‘1’, then sum all 24 products. Perfect alignment produces a large positive value; misalignment produces values near zero. The peak stands out sharply from the noise floor. We already had the information. We just had to use it.
The new `frame_sync_detector.vhd` in our encoder-dev branch implements soft-decision correlation with several key features:
First, we have parallel data paths. We maintain two shift registers. One is for soft decisions (24 × 16-bit signed values) and one is for hard decisions (24 bits). The soft path feeds the correlator; the hard path handles byte assembly after sync is found. This lets us have our cake and eat it too.
Second, we have polarity-aware correlation. This sounds fancy, but it’s a simple process. Our minimum shift keying demodulator uses a convention where positive soft values indicate ‘0’ bits and negative values indicate ‘1’ bits. The correlator accounts for this. When the sync word expects a ‘1’, we subtract the sample (making a negative contribution become positive). This detail matter. If we get the polarity wrong then our correlator becomes an anti-correlator.
Third, we have frame tracking with a flywheel. Once locked, we don’t search for sync on every bit. Instead, we count through the known frame length and verify sync where we expect it. This “flywheel” approach dramatically reduces computation and provides robustness against brief interference. We maintain lock through up to two consecutive missed syncs before returning to full search mode.It takes three consecutive successful sync word detections to declare lock. We may update these numbers later on if they are too small or too big. This is a good start and is similar to what commercial communications systems implement. We’re on the right track here.
Fourth, we have adaptive thresholds. Our HUNTING mode uses a stricter threshold than LOCKED mode. When searching, we need high confidence to avoid false positives. Once locked and tracking, we can be more forgiving. If sync is in roughly the right place with reasonable correlation, we stay locked. We have to really lose track of our frame boundaries in order to go back to HUNTING, where we search through every single bit we receive with a sliding window and correlator to find our optimized pattern.
Fifth, we have some debug instrumentation. The design exports correlation values and peak tracking signals, essential for threshold calibration. We can’t set thresholds blindly; they depend on your ADC scaling, Costas loop gains, and signal levels. We need to know what the correlator calculated and we need to know the peak detected. Otherwise we might be way off on thresholds.
The combination of optimized sync word and soft-decision detection provides measurable improvements.
For pure AWGN channels, correlation detection offers roughly 2-3 dB improvement over Hamming distance at moderate SNR. The optimal sync word provides a slight additional edge at very low SNR compared to concatenated Barker. This means that we can deliver the same performance with about half the signal power. That’s not bad. But, the real payoff comes in multipath environments. With delayed echoes from terrain features, the 8:1 PSLR sync word dramatically outperforms the 3:1 concatenated Barker code. The suppressed sidelobes mean echoes are far less likely to trigger false sync detection. Combined with correlation-based detection, we see substantial improvement in frame acquisition reliability under realistic VHF/UHF propagation conditions.
If you’re building an Opulent Voice implementation, here’s how to calibrate the correlation thresholds.
First, connect the debug correlation output to an ILA or register interface. Second, transmit known sync words and observe the peak correlation value. Third, set `HUNTING_THRESHOLD` to 70-80% of this observed peak. Finally, set `LOCKED_THRESHOLD` to 40-50% of the observed peak.
The defaults in the VHDL (10,000 for hunting, 5,000 for locked) were conservative starting points. Your actual values will depend on your particular signal chain in your design.