Below is a list of bugs that took me multiple hours, if not days, to troubleshoot and analyze. These are the “epic bugs” that are worth remembering for my career.
Epic Bug 1: G2O Optimization Didn’t Update Vertex
Summary
This bug took several hours (if not days) to debug. It appeared as if G2O was ignoring the optimization — the vertex (pose) remained unchanged despite running optimizer.optimize(1)
After ruling out common culprits like:
- Vertex not being added correctly
- Edges not referencing the right vertex
- Incorrect Jacobian implementation
- G2O configuration/setup issues
I eventually traced the issue to a subtle but critical misunderstanding in the error term formulation.
Context
In a point-to-line 2D ICP formulation, the error term e is typically calculated as the distance from a point to a line. The simplified (but effective) version of that is:
\[\begin{gather*} \begin{aligned} & e = ap_x + bp_y + c \end{aligned} \end{gather*}\]Where a, b, and c define a line (ax + by + c = 0), and $(p_x, p_y)$ is the point.
In my case, the point came from source_cloud, expressed in the body frame. However, the line coefficients a, b, c were fit in the map frame, using nearest neighbors from the target cloud.
The Mistake
I precomputed the error term and stored it inside a struct:
1
_error[0] = point_line_data_vec_ptr_->at.error_;
And upstream, this was assigned as:
1
ret.at(idx).error_ = source_pt * point_coord; // Wrong!
The problem? This source_pt
is in the body frame, and using it in the error term implies that optimization is being done relative to the body frame, not the map/submap frame. Because the error is now invariant to pose changes, optimization has no gradient — G2O doesn’t change the pose, even if the edges are correctly wired.
What Threw Me Off
- Point-to-line distances are frame-invariant.
- But scaled error terms like ap_x + bp_y + c are not.
- That mistake causes the optimizer to think the current pose is already optimal — so it just stays put.
It was like optimizing with the body frame assumed to be the map frame — a silent bug with no crash or warning, just no progress.
The Fix
Don’t precompute error_ using body frame coordinates. Instead, compute e = ax + by + c dynamically in computeError() using the transformed map-frame point.
The corrected version is now:
1
2
3
4
5
6
7
8
9
10
11
12
13
class EdgeICP2D_PT2Line : public g2o::BaseUnaryEdge<1, double, VertexSE2> {
....
void computeError() override {
auto *pose = static_cast<const VertexSE2 *>(_vertices[0]);
double r = source_scan_objs_ptr_->at(point_idx_).range;
double a = point_line_data_vec_ptr_->at(point_idx_).params_[0];
double b = point_line_data_vec_ptr_->at(point_idx_).params_[1];
double c = point_line_data_vec_ptr_->at(point_idx_).params_[2];
double theta = source_scan_objs_ptr_->at(point_idx_).angle;
Vec2d pw = pose->estimate() * Vec2d{r * cos(theta), r * sin(theta)};
_error[0] = a * pw.x() + b * pw.y() + c;
}
}
Lessons Learned
- Optimized pipelines are hard to debug. To maximize vectorization, it’s tempting to parallely calculate and store intermediate results. However, if something goes wrong downstream, especially when we have a conceptual math error,it may stem from a silent assumption upstream.
- Coordinate frames matter: Even when the math looks simple, subtle frame mismatches can render your optimizer useless.
- Scaled point-to-line errors are not frame invariant: If you’re using
ap_x + bp_y + c
, you must express the point in the same frame as the line. - Verbose mode helps: G2O’s
setVerbose(true)
didn’t show errors, but the chi² staying constant was a hint that nothing was being optimized.
Epic Bug 2: Compiler Bugs
Here I’m not writing about “giant” bugs, but small tricky ones.
Cannot Find Overloaded Operators
I had a bug where an operator <<
is defined in namespace1
. Because I spent most of my time developing within this namespace, I forgot that I should have included namespace1
in its test, where namespaces are clearly indicated.
Non-Dependent static_assert in if constexpr
Always Fails In Older Compiler
In gcc 14.2
, if constexpr
can be evaluated properly. But in the snippet below, it cannot be evaluated properly in gcc 10.1
. Here is a proposal for the fix in new compiler
Here is a code snippet, but I’m posting here anyways
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
#include <type_traits>
inline constexpr bool always_false = false;
template <typename T>
inline constexpr bool templated_always_false = false;
// this compiles in gcc 14.2
template <typename Foo>
void my_func() {
if constexpr (std::is_same_v<Foo, int>) {
// do something
} else {
// This line is fine because it is dependent on a template parameter, which forces evaluation in if constexpr?.
// So use it in older compilers
static_assert(templated_always_false<Foo>, "Unsupported Foo type");
static_assert(false, "Unsupported Foo type");
}
}
- Use
gcc --version
to check your compiler’s version!
PCL Is A Worm Hole
- PCL does not support in-place filtering. Use a tmp cloud instead:
1 2 3 4
voxel_filter_.setInputCloud(local_map_); voxel_filter_.filter(*tmp_cloud_); // swap or copy the filtered result back into local_map_ local_map_->swap(*tmp_cloud_);