@@ -12,15 +12,17 @@ Theia has a full Structure-from-Motion pipeline that is extremely efficient. Our
12
12
overall pipeline consists of several steps. First, we extract features (SIFT is
13
13
the default). Then, we perform two-view matching and geometric verification to
14
14
obtain relative poses between image pairs and create a :class: `ViewGraph `. Next,
15
- we perform global pose estimation with global SfM. Global SfM is different from
16
- incremental SfM in that it considers the entire view graph at the same time
17
- instead of incrementally adding more and more images to the
18
- :class: `Reconstruction `. Global SfM methods have been proven to be very fast
19
- with comparable or better accuracy to incremental SfM approaches (See
20
- [JiangICCV ]_, [MoulonICCV ]_, [WilsonECCV2014 ]_), and they are much more readily
21
- parallelized. After we have obtained camera poses, we perform triangulation and
22
- :class: `BundleAdjustment ` to obtain a valid 3D reconstruction consisting of
23
- cameras and 3D points.
15
+ we perform either incremental or global SfM. Incremental SfM is the standard
16
+ approach that adds on one image at a time to grow the reconstruction. While this
17
+ method is robust, it is not scalable because it requires repeated operations of
18
+ expensive bundle adjustment. Global SfM is different from incremental SfM in
19
+ that it considers the entire view graph at the same time instead of
20
+ incrementally adding more and more images to the :class: `Reconstruction `. Global
21
+ SfM methods have been proven to be very fast with comparable or better accuracy
22
+ to incremental SfM approaches (See [JiangICCV ]_, [MoulonICCV ]_,
23
+ [WilsonECCV2014 ]_), and they are much more readily parallelized. After we have
24
+ obtained camera poses, we perform triangulation and :class: `BundleAdjustment ` to
25
+ obtain a valid 3D reconstruction consisting of cameras and 3D points.
24
26
25
27
The first step towards creating a reconstruction is to determine images which
26
28
view the same objects. To do this, we must create a :class: `ViewGraph `.
@@ -29,8 +31,6 @@ view the same objects. To do this, we must create a :class:`ViewGraph`.
29
31
#. Match features to obtain image correspondences.
30
32
#. Estimate camera poses from two-view matches and geometries.
31
33
32
- .. TODO :: Insert figure.
33
-
34
34
#1. and #2. have been covered in other sections, so we will focus on creating a
35
35
reconstruction from two-view matches and geometry. First, we will describe the
36
36
fundamental elements of our reconstruction.
@@ -40,7 +40,7 @@ Reconstruction
40
40
41
41
.. class :: Reconstruction
42
42
43
- .. TODO :: Insert figure.
43
+ .. image :: pisa.png
44
44
45
45
At the core of our SfM pipeline is an SfM :class: `Reconstruction `. A
46
46
:class: `Reconstruction ` is the representation of a 3D reconstuction consisting
@@ -117,8 +117,6 @@ ViewGraph
117
117
118
118
.. class :: ViewGraph
119
119
120
- .. TODO :: INSERT FIGURE HERE
121
-
122
120
A :class: `ViewGraph ` is a basic SfM construct that is created from two-view
123
121
matching information. Any pair of views that have a view correlation form an
124
122
edge in the :class: `ViewGraph ` such that the nodes in the graph are
@@ -296,9 +294,83 @@ In addition to typical getter/setter methods for the camera parameters, the
296
294
according to the camera orientation in 3D space. The returned vector is not
297
295
unit length.
298
296
297
+ Incremental SfM Pipeline
298
+ ========================
299
+
300
+ .. image :: incremental_sfm.png
301
+
302
+ The incremental SfM pipeline follows very closely the pipelines of `Bundler
303
+ <http://www.cs.cornell.edu/~snavely/bundler/> `_ [PhotoTourism ]_ and `VisualSfM
304
+ <http://ccwu.me/vsfm/> `_ [VisualSfM ]_. The method begins by first estimating the
305
+ 3D structure and camera poses of 2 cameras based on their relative pose. Then
306
+ additional cameras are added on sequentially and new 3D structure is estimated
307
+ as new parts of the scene are observed. Bundle adjustment is repeatedly
308
+ performed as more cameras are added to ensure high quality reconstructions and
309
+ to avoid drift.
310
+
311
+ The incremental SfM pipeline is as follows:
312
+ #. Choose an initial camera pair to reconstruct.
313
+ #. Estimate 3D structure of the scene.
314
+ #. Bundle adjustment on the 2-view reconstruction.
315
+ #. Localize a new camera to the current 3D points. Choose the camera that
316
+ observes the most 3D points currently in the scene.
317
+ #. Estimate new 3D structure.
318
+ #. Bundle adjustment if the model has grown by more than 5% since the last
319
+ bundle adjustment.
320
+ #. Repeat steps 4-6 until all cameras have been added.
321
+
322
+ Incremental SfM is generally considered to be more robust than global SfM
323
+ methods; hwoever, it requires many more instances of bundle adjustment (which
324
+ is very costly) and so incremental SfM is not as efficient or scalable.
325
+
326
+ .. member :: double ReconstructorEstimatorOptions::multiple_view_localization_ratio
327
+
328
+ DEFAULT: ``0.8 ``
329
+
330
+ If M is the maximum number of 3D points observed by any view, we want to
331
+ localize all views that observe > M * multiple_view_localization_ratio 3D
332
+ points. This allows for multiple well-conditioned views to be added to the
333
+ reconstruction before needing bundle adjustment.
334
+
335
+ .. member :: double ReconstructionEstimatorOptions::absolute_pose_reprojection_error_threshold
336
+
337
+ DEFAULT: ``8.0 ``
338
+
339
+ When adding a new view to the current reconstruction, this is the
340
+ reprojection error that determines whether a 2D-3D correspondence is an
341
+ inlier during localization.
342
+
343
+ .. member :: int ReconstructionEstimatorOptions::min_num_absolute_pose_inliers
344
+
345
+ DEFAULT: ``30 ``
346
+
347
+ Minimum number of inliers for absolute pose estimation to be considered
348
+ successful.
349
+
350
+ .. member :: double ReconstructionEstimatorOptions::full_bundle_adjustment_growth_percent
351
+
352
+ DEFAULT: ``5.0 ``
353
+
354
+ Bundle adjustment of the entire reconstruction is triggered when the
355
+ reconstruction has grown by more than this percent. That is, if we last ran
356
+ BA when there were K views in the reconstruction and there are now N views,
357
+ then G = (N - K) / K is the percent that the model has grown. We run bundle
358
+ adjustment only if G is greater than this variable. This variable is
359
+ indicated in percent so e.g., 5.0 = 5%.
360
+
361
+ .. member :: int ReconstructionEstimatorOptions::partial_bundle_adjustment_num_views
362
+
363
+ DEFAULT: ``20 ``
364
+
365
+ During incremental SfM we run "partial" bundle adjustment on the most
366
+ recent views that have been added to the 3D reconstruction. This parameter
367
+ controls how many views should be part of the partial BA.
368
+
299
369
Global SfM Pipeline
300
370
===================
301
371
372
+ .. image :: global_sfm.png
373
+
302
374
The global SfM pipelines in Theia follow a general procedure of filtering
303
375
outliers and estimating camera poses or structure. Removing outliers can help
304
376
increase performance dramatically for global SfM, though robust estimation
@@ -325,8 +397,8 @@ follows:
325
397
.. class :: ReconstructionEstimator
326
398
327
399
This is the base class for which all SfM reconstruction pipelines derive
328
- from. The reconstruction estimation type can be specified at runtime, though
329
- currently only ``NONLINEAR `` is implemented.
400
+ from. The reconstruction estimation type can be specified at runtime
401
+ ( currently ``NONLINEAR `` and `` INCREMENTAL `` are implemented) .
330
402
331
403
.. function :: ReconstructionEstimator::ReconstructionEstimator(const ReconstructorEstimatorOptions& options)
332
404
@@ -583,15 +655,17 @@ Estimating Global Positions
583
655
===========================
584
656
585
657
Positions of cameras may be estimated simultaneously after the rotations are
586
- known. We use a nonlinear optimization to estimate camera positions based. Given
587
- pairwise relative translations from :class: `TwoViewInfo ` and the estimated
588
- rotation, the constraint
658
+ known. We use either a linear or a nonlinear optimization to estimate camera
659
+ positions based.
660
+
661
+ Given pairwise relative translations from :class: `TwoViewInfo `
662
+ and the estimated rotation, the constraint
589
663
590
664
.. math :: R_i * (c_j - c_i) = \alpha_{i,j} * t_{i,j}
591
665
592
- Where :math: `\alpha _{i,j} = ||c_j - c_i||^ 2 `. This ensures that we optimize for
593
- positions that agree with the relative positions computed in two-view
594
- estimation.
666
+ is used to determine the global camera positions, where :math: `\alpha _{i,j} =
667
+ ||c_j - c_i||^ 2 `. This ensures that we optimize for positions that agree with
668
+ the relative positions computed in two-view estimation.
595
669
596
670
.. class :: NonlinearPositionEstimatorOptions
597
671
@@ -653,6 +727,40 @@ estimation.
653
727
using the nonlinear algorithm described above. Only positions that have an
654
728
orientation set are estimated. Returns true upons success and false on failure.
655
729
730
+
731
+ .. class :: LinearPositionEstimator
732
+
733
+ .. image :: global_linear_position_estimation.png
734
+ :width: 40%
735
+ :align: center
736
+
737
+ For the linear position estimator of [JiangICCV ]_, we utilize an approximate geometric error to determine the position locations within a triplet as shown above. The cost function we minimize is:
738
+
739
+ .. math :: f(i, j, k) = c_k - \dfrac{1}{2} (c_i + ||c_k - c_i|| c_{ik}) + c_j + ||c_k - c_j|| c_{jk})
740
+
741
+ This can be formed as a linear constraint in the unknown camera positions :math: `c_i`. Tthe solution that minimizes this cost lies in the null-space of the resultant linear system. Instead of extracting the entire null-space as [JiangICCV ]_ does, we instead hold one camera constant at the origin and use the Inverse-Iteration Power Method to efficiently determine the null vector that best solves our minimization. This results in a dramatic speedup without sacrificing efficiency.
742
+
743
+ .. NOTE :: Currently this position estimation method is not integrated into the Theia global SfM pipeline. More testing needs to be done with this method before it can be reliably integrated.
744
+
745
+ .. member :: int LinearPositionEstimator::Options::num_threads
746
+
747
+ DEFAULT: ``1 ``
748
+
749
+ The number of threads to use to solve for camera positions
750
+
751
+ .. member :: int LinearPositionEstimator::Options::max_power_iterations
752
+
753
+ DEFAULT: ``1000 ``
754
+
755
+ Maximum number of power iterations to perform while solving for camera positions.
756
+
757
+ .. member :: double LinearPositionEstimator::Options::eigensolver_threshold
758
+
759
+ DEFAULT: ``1e-8 ``
760
+
761
+ This number determines the convergence of the power iteration method. The
762
+ lower the threshold the longer it will take to converge.
763
+
656
764
Triangulation
657
765
=============
658
766
@@ -853,7 +961,7 @@ the reprojection error.
853
961
Similarity Transformation
854
962
=========================
855
963
856
- .. function :: void AlignPointCloudsICP(const int num_points, const double left[], const double right[], double rotation[3 * 3 ], double translation[3 ])
964
+ .. function :: void AlignPointCloudsICP(const int num_points, const double left[], const double right[], double rotation[], double translation[])
857
965
858
966
We implement ICP for point clouds. We use Besl-McKay registration to align
859
967
point clouds. We use SVD decomposition to find the rotation, as this is much
@@ -863,7 +971,7 @@ Similarity Transformation
863
971
the left and right reconstructions have the same number of points, and that the
864
972
points are aligned by correspondence (i.e. left[i] corresponds to right[i]).
865
973
866
- .. function :: void AlignPointCloudsUmeyama(const int num_points, const double left[], const double right[], double rotation[3 * 3 ], double translation[3 ], double* scale)
974
+ .. function :: void AlignPointCloudsUmeyama(const int num_points, const double left[], const double right[], double rotation[], double translation[], double* scale)
867
975
868
976
This function estimates the 3D similiarty transformation using the least
869
977
squares method of [Umeyama ]_. The returned rotation, translation, and scale
@@ -903,3 +1011,17 @@ Similarity Transformation
903
1011
``solution_translation ``: the translation of the candidate solutions
904
1012
905
1013
``solution_scale ``: the scale of the candidate solutions
1014
+
1015
+ .. function :: void SimTransformPartialRotation(const Eigen::Vector3d& rotation_axis, const Eigen::Vector3d image_one_ray_directions[5], const Eigen::Vector3d image_one_ray_origins[5], const Eigen::Vector3d image_two_ray_directions[5], const Eigen::Vector3d image_two_ray_origins[5], std::vector<Eigen::Quaterniond>* soln_rotations, std::vector<Eigen::Vector3d>* soln_translations, std::vector<double>* soln_scales)
1016
+
1017
+ Solves for the similarity transformation that will transform rays in image
1018
+ two such that the intersect with rays in image one such that:
1019
+
1020
+ .. math :: s * R * X' + t = X
1021
+
1022
+ where s, R, t are the scale, rotation, and translation returned, X' is a
1023
+ point in coordinate system 2 and X is the point transformed back to
1024
+ coordinate system 1. Up to 8 solutions will be returned.
1025
+
1026
+ Please cite the paper "Computing Similarity Transformations from Only Image
1027
+ Correspondences" by C. Sweeney et al (CVPR 2015) [SweeneyCVPR2015 ]_ when using this algorithm.
0 commit comments