logo资料库

英文版Tsai两步标定法文献.pdf

第1页 / 共22页
第2页 / 共22页
第3页 / 共22页
第4页 / 共22页
第5页 / 共22页
第6页 / 共22页
第7页 / 共22页
第8页 / 共22页
资料共22页,剩余部分请下载后查看
IEEE JOURNAL OF ROBOTICS AND AUTOMATION, VOL. RA-3, NO. 4, AUGUST 1987 323 A Versatile Camera Calibration Techniaue for High-Accuracy 3D Machine Vision Metrology Using Off-the-shelf TV Cameras and Lenses ROGER Y. TSAI Abstract-A new technique for three-dimensional (3D) camera calibra- tion for machine vision metrology using off-the-shelf TV cameras and lenses is described. The two-stage technique is aimed at efficient computation of camera external position and orientation relative to object reference coordinate system as well as the effective focal length, radial lens distortion, and image scanning parameters. The two-stage technique has advantage in terms of accuracy, speed, and versatility over existing state of the art. A critical review of the state of the art is given in the beginning. A theoretical framework is established, supported by comprehensive proof in five appendixes, and may pave the way for future research on 3D robotics vision. Test results using real data are described. Both accuracy and speed are reported. The experimental results are analyzed and compared with theoretical prediction. Recent effort indi- cates that with slight modification, the two-stage calibration can be done in real time. I. INTRODUCTION A . The Importance of Versatile Camera Calibration dimensional (3D) machine vision is Technique C AMERA CALIBRATION in the context of three- the process of determining the internal camera geometric and optical charac- teristics (intrinsic parameters) and/or the 3D position and orientation of the camera frame relative to a certain world coordinate system (extrinsic parameters), for the following purposes. I ) Inferring 3 0 Information from Computer Image Coordinates: There are two kinds of 3D information to be inferred. They are different mainly because of the difference in applications. a) The first is 3D information concerning the location of the object, target, or feature. For simplicity, if the object is a point feature (e.g., a point spot on a mechanical part illuminated by a laser beam, or the corner of an electrical component on a printed circuit board), camera calibration provides a way of determining a ray in 3D space that the object point must lie on, given the computer image coordinates. With two views either taken from two cameras ,or one camera in two locations, the position of the object point can be determined by intersecting the two rays. Both intrinsic and extrinsic parame- ters need to be calibrated. The applications include mechanical part dimensional measurement, automatic assembly of me- chanical or electronics components, tracking, robot calibration and trajectory analysis. In the above applications, the camera calibration need be done only once. b) The second kind is 3D information concerning the position and orientation of moving camera (e.g., a camera held by a robot) relative to the target world coordinate system. The applications include robot calibration with camera-on- robot configuration, and robot vehicle guidance. 2) Inferring 2 0 Computer Image Coordinates from 3 0 In formation: In model-driven inspection or assembly appli- cations using machine vision, a hypothesis of the state of the world can be verified or confirmed by observing if the image coordinates of the object conform to the hypothesis. In doing so, it is necessary to have both the intrinsic and extrinsic camera model parameters calibrated so that the two-dimen- sional (2D) image coordinate can be properly predicted given the hypothetical 3D location of the object. The above purposes can be best served if the following criteria for the camera calibration are met. I ) Autonomous: The calibration procedure should not require operator intervention such as giving initial guesses for certain parameters, or choosing certain system parameters manually. 2) Accurate: Many applications such as mechanical part inspection, assembly, or robot arm calibration require an accuracy that is one part in a few thousand of the working range. The camera calibration technique should have the potential of meeting such accuracy requirements. This re- quires that the theoretical modeling of the imaging process must be accurate (should include lens distortion and perspec- tive rather than parallel projection). 3) Reasonably Efficient: The complete camera calibra- tion procedure should not include high dimension (more than five) nonlinear search. Since type b) application mentioned earlier needs repeated calibration of extrinsic parameters, the calibration approach should allow enough potential for high- speed implementation. 4) Versatile: The calibration technique should operate uniformly and autonomously for a wide range of accuracy requirements, optical setups, and applications. 5) Need Only Common Off-the-shelf Camera and Lens: Most camera calibration techniques developed in the photogrammetric area require special professional cameras and processing equipment. Such requirements prohibit full automation and are labor-intensive and time-consuming to Manuscript received October 18, 1985; revised September 2, 1986. A version of this paper was presented at the 1986 IEEE International Conference on Computer Vision and Pattern Recognition and received the Best Paper Award. The author is with the IBM T. J . Watson Research Center, Yorktown Heights, NY 10598. IEEE Log Number 8613011. 0882-4967/87/0800-0323$01.00 O 1987 IEEE
324 implement. ’ The advantages of using off-the-shelf solid state or vidicon camera and lens are versatile-solid state cameras and lenses can be used for a variety of automation applications; availability-since off-the-shelf solid state cameras and lenses are common in many applications, they are at hand when you need them and need not be custom ordered; familiarity, user-friendly-not many people have the experience of operating the professional metric camera used in photogrammetry or the tetralateral photodiode with preamplifier and associated electronics calibration, while solid state is easily interfaced with a computer and easy to install. The next section shows deficiencies of existing techniques in one or more of these criteria. B. Why Existing Techniques Need Improvement In this section, existing techniques are first classified into of each several categories. The strength and weakness category are analyzed. Category I-Techniques Involving Full-Scale Nonlinear Optimization: See [I]-[3], [7], [lo], [14], [17], [22], [30], for example. Advantage: It allows easy adaptation of any arbitrarily accurate yet complex model for imaging. The best accuracy obtained in this category is comparable to the accuracy of the new technique proposed in this paper. Problems: I) It requires a good initial guess to start the nonlinear search. This violates the principle of automation. 2) It needs computer-intensive full-scale nonlinear search. Classical Approach: Faig’s technique [7] is a good representative for these techniques. It uses a very elaborate model for imaging, uses at least 17 unknowns for each photo, and is very computer-intensive [7]. However, because of the large number of unknowns, the accuracy is excellent. The rms (root mean square or average) error can be as good as 0.1 mil. However, this rms error is in photo scale (i.e., error of fitting the model with plane). When transformed into 3D error, it is comparable to the average error (0.5 mil) obtained using monoview multiplane calibra- is the typical case among the various tion technique, which two-stage techniques proposed in this paper. Another reason why such photogrammetric techniques produce very accurate results is that large professional format photo is used rather than solid-state image array such as CCD. The resolution for such photos is generally three to four times better than that for the solid-state imaging sensor array. the observations in image Direct linear transformation (DL T): Another example is the direct linear transformation (DLT) developed by Abdel- Aziz and Karara [ 11, [2]. One reason why DLT was developed is that only linear equations need be solved. However, it was later found that, unless lens distortion is ignored, full-scale nonlinear search is needed. In [14, p. 361 Karara, the co- ’ Although existing techniques such Section I-B) can be implemented cameras, the version NBS implemented uses high resolution analog tetra- lateral photodiode, and the associated optoelectronics accessories need special manual calibration (see [5] for details). transformation (see or vidicon using common solid state as direct linear IEEE JOURNAL OF ROBOTICS AUTOMATION, AND VOL. RA-3, NO. 4, AUGUST 1987 inventor of DLT, comments, When originally presented in 1971 (Abdel-Aziz and Karara, 1971), the DLT basic equations did not involve any image refinement parameters, and represented an actual linear transformation between comparator coordinates and object- space coordinates. When the DLT mathematical model was later expanded to encompass image refinement parameters, the title DLT was retained unchanged. Although Wong [30] mentioned that there are two possible procedures of using DLT (one entails solving linear equations only, and the other requires nonlinear search), the procedure using linear equation solving actually contains approximation. One of the artificial parameters he introduced, K ~ , is a function of (x, y , z ) world coordinate and therefore not a constant. Nevertheless, DLT bridges the gap between photogrammetry and computer vision so that both areas can use DLT directly to solve camera calibration problem. (to be discussed the same type of measure When lens distortion is not considered, DLT falls into the second category later) that entails solving linear equations only. It, too, has its pros and cons and will be discussed later when the second category is presented. Dainis and Juberts [5] from the Manufacturing Engineering Center of NBS reported results using DLT for camera calibrations to do accurate measurement of robot trajectory motion. Although the NBS system can do 3D measurement at a rate of 40 Hz, the camera calibration was not and need not be done in real time. The accuracy reported uses for accessing or evaluating camera calibration accuracy as Type I measure used in this paper (see Section 111-A). The total accuracy in 3D is one part in 2000 within the center 80 percent of the detector field of view. This is comparable to the accuracy of the proposed two-stage method in measuring the x (the proposed two-stage and y parts of the 3D coordinates technique yields better percentage accuracy for the depth). Notice, however, that the image sensing device NBS used is not a TV camera but a tetralateral photodiode. It senses the position of incidence light spot on the surface of detector by means of analog and uses a 12-bit AID converter to convert the analog positions into a digital quantity to be processed by the computer. Therefore, the effective 4K X 4K spatial resolution, as opposed to a 388 X 480 full-resolution Fairchild CCD area sensor. Many thought that the low resolution characteristics of solid-state imaging sensor could not be used This paper reveals that wit,h proper calibration, a solid-state sensor (such as CCD) is still a valid tool in high-accuracy 3D machine vision metrology applications. Dainis and Juberts [5] mentioned that the accuracy is 100 percent lower for points outside the center 90-percent field of view. This suggests that lens distortion is not considered when using DLT to calibrate the camera. Therefore, to be solved. This actually puts the NBS work in a different category that follows which include all techniques that computes the perspective transformation matrix first. Again, the pros and cons for the latter will be discussed later. for high-accuracy 3D metrology. tetralateral photodiode has an only linear equations need Sobel, Gennery, Lowe: Sobel [23] described a system solving. is for calibrating a camera using nonlinear equation Eighteen parameters must be optimized. The approach
TSAI: VERSATILE CAMERA CALIBRATION TECHNIQUE 325 similar to Faig’s method described earlier. No accuracy results were reported. Gennery [lo] described a method that finds camera parameters iteratively by minimizing the error of epipolar constraints without using 3D coordinates of calibra- tion points. It is mentioned in [4, p. 2531 and [20, p. 501 that the technique is too error-prone. Category 11- Techniques Involving Computing Perspec- tive Transformation Matrix First Using Linear Equation Solving: See [ 13, [2], [9], [ 111, [ 141, [241, [251, and [3 11, for example. Advantage: No nonlinear optimization is needed. Problems: 1) Lens distortion cannot be considered. 2) The number of unknowns in linear equations is much larger than the actual degrees of freedom (i.e., the unknowns to be solved are not linearly independent). The disadvantage of such redundant parameterization is that erroneous combination of these parameters can still make a good fit between experimen- tal observations and model prediction in real situation when the observation is not perfect. This means the accuracy potential is limited in noisy situation. Although the equations characterizing the transformation from 3D world coordinates to 2D image coordinates are nonlinear functions of the extrinsic and intrinsic camera model parameters (see Section 11-C1 and -2 for definition of camera parameters), they are linear if lens distortion is ignored and if the coefficients of the 3 x 4 perspective transformation matrix are regarded as unknown parameters (see Duda and Hart [6] for a definition of perspective transformation matrix). Given the 3D world coordinates of a number of points and the corresponding 2D image coordinates, the coefficients in the perspective transformation matrix can be solved by least square solution of an overdetermined systems of linear equations. Given the perspective transformation matrix, the camera model parameters can then be computed if needed. However, many investigators have found that ignoring lens distortion is unacceptable when doing 3D measurement (e.g., Itoh et al. [12], Luh and Klassen [ 161). The error of 3D measurement reported in this paper using two-stage camera calibration technique would have been an order of magnitude larger if the lens distortion were not corrected. Sutherland: Sutherland [25] formulated very explicitly the procedure for computing the perspective transformation matrix given 3 0 world coordinates and 2D image coordinates of a number of points. It was applied to graphics applications, and no accuracy results are reported. Yakimovsky and Cunningham: Yakimovsky and Cun- ningham’s stereo program [31] was developed for the JPL Robotics Research Vehicle, a testbed for a Mars rover and remote processing systems. Due to the narrow field of view and large object distance, they used a highly linear lens and ignored distortion. They reported that the 3D measurement accuracy of k 5 mm at a distance of 2 m. This is equivalent to a depth resolution of one part in 400, which is one order of magnitude less accurate than the test results to be described in this paper. One reason is that Yakimovsky and Cunningham’s system does not consider lens distortion. The other reason is probably that the unknown parameters computed by linear equations are not linearly independent. Notice also that had it not been for the fact that the field of view in Yakimovsky and Cunningham’s system is narrow and that the object distance is large, ignoring distortion should cause more error. DLT: By disregarding lens distortion, DLT developed by Abdel-Aziz and Karara [ 11, [2] described in Category I falls into Category 11. Accuracy results on real experiments have been reported only by Dainis and Juberts from NBS [SI. The accuracy results and the comparison with the proposed technique are described earlier in Category I. Hall et al.: Hall et al. [l 11 used a straightforward linear least square technique to solve for the elements of perspective transformation matrix for doing 3D curved surface measure- ment. The computer 3D coordinates were tabulated, but no ground truth was given, and therefore the accuracy is unknown. Ganapathy, Strat: Ganapathy [9] derived a noniterative technique in computing camera parameters given the perspec- tive transformation matrix computed using any of the tech- niques discussed in this category. He used the perspective transformation matrix given from Potmesil through private communications and computed the camera parameters. It was not applied to 3D measurement, and therefore no accuracy results were available. Similar results are obtained by Strat ~ 4 1 . Category 111-Two-Plane Method: See [13] and 1191 for ple. Advantage: No nonlinear search is needed. Problems: 1) No lens distortion can be considered. 2) Focal length is assumed given. 3) Uncertainty of image scale factor is not allowed. Fischler and Bolles [8] use a geometric construction to derive direct solution for the camera locations and orientation. However, none of the camera intrinsic parameters (see Section 11-C2) can be computed. No accuracy results of real 3D measurement was reported. example. Advantage: Only linear equations need be solved. Problems: 1) The number of unknowns is at least 24 (12 for each plane), much larger than the degrees of freedom. 2) The formula used for the transformation between image and object coordinates is empirically based only. The two-plane method developed by Martins et ai. [19] theoretically can be applied in general without having any restrictions on the extrinsic camera parameters. However, for the experimental results reported, the relative orientation between the camera coordinate system and the object world coordinate system was assumed known (no relative rotation). In such a restricted case, the average error is about 4 mil with a distance of 25 in, which is comparable to the accuracy obtained using the proposed technique. Since the formula for the transformation between image and object coordinates is empirically based, it is not clear what kind of approximation is assumed. Nonlinear lens distortion theoretically cannot be corrected. A general calibration using the two-plane technique was proposed by Isaguirre et al. [13]. Full-scale nonlinear optimization is needed. No experimental results were re- ported. Category IV-Geometric Technique: See [8] for exam-
326 IEEE JOURNAL OF ROBOTICS AND AUTOMATION, VOL. RA-3. NO. 4, AUGUST 1987 0 )X 11. THE NEW APPROACH TO MACHINE VISION CAMERA CALIBRATION USING A TWO-STAGE TECHNIQUE the problem. After In the following, an overview is first given that describes the strategy we took in approaching the overview, the underlying camera model and the definition of the parameters to be calibrated are described. Then, the calibration algorithm and the theoretical derivation and other issues will be presented. For those readers who would like to have a physical feeling of how to perform calibration in a real setup, first read “Experimental Procedure,” Section IV-A1 . A . Overview Camera calibration entails solving for a large number of calibration parameters, resulting in the classical approach mentioned in the Introduction that requires large scale nonlin- ear search. The conventional way of avoiding this large-scale nonlinear search is to use the approaches similar to DLT described in the Introduction that solves for a set of parameters (coefficients of homogeneous transformation matrix) with linear equations, ignoring the dependency between the param- eters, resulting in a situation with the number of unknowns greater than the number of degrees of freedoms. The lens distortion is also ignored (see the Introduction for more detail). Our approach is to look for a real constraint or equation that is only a function of a subset of the calibration parameters to reduce the dimensionality of the unknown parameter space. It turns out that such constraint does exist, and we call it the radial alignment constraint (to be described later). This constraint (or equations resulting from such physical constraint) is only a function of the relative rotation and translation (except for the z component) between the camera and the calibration points (see Section 11-B for detail). Furthermore, although the constraint is a nonlinear function of the abovementioned calibration parameters (called group I parameters), there is a simple and efficient way of computing them. The rest of the calibration parameters (called group I1 parameters) are computed with normal projective equations. A very good initial guess of group I1 parameters can be obtained by ignoring the lens distortion and using simple linear equation with two unknowns. The precise values for group I1 parame- ters can then be computed with one or two iterations in minimizing the perspective equation error. Be aware that when single-plane calibration points are used, the plane must not be exactly parallel to image plane (see (15), to follow, for detail). for the image-to-object transformation described in the next section, subpixel accu- racy interpolation for extracting image coordinates of calibra- tion points can be used to enhance the calibration accuracy to maximum. Note that this is not true if a DLT-type linear approximation technique is used since ignoring distortion results in image coordinate error more than a pixel unless very narrow angle subpixel accuracy image feature extraction is described in Section IV- Al. B. The Camera Model lens is used. One way of achieving Due to the accurate modeling This section describes the camera model, defines the calibration parameters, and presents the simple radial align- Fig. 1. Camera geometry with perspective projection and radial lens distortion. or P(xw,yw,zw) at Oi ment principle (to be described in Section 11-E) that provides the original motivation for the proposed technique. The camera model itself is basically the same as that used by any of the techniques in Category I in Section I-B. I) The Four Steps of Transformation from 3 0 World Coordinate to Camera Coordinate: Fig. 1 illustrates the basic geometry of the camera model. (xw, y w , z,) is the 3D coordinate of the object point P in the 3D world coordinate system. (x, y , z ) is the 3D coordinate of the object point P in the 3D camera coordinate system, which is centered at point 0, the optical center, with the z axis the same as the optical axis. ( X , Y ) is the image coordinate system centered (intersection of the optical axis z and the front image plane) and parallel to x and y axes. f is the distance between front the optical center. (X,, Y,) is the image image plane and coordinate of (x, y , z) if a perfect pinhole camera model is used. ( X d , Yd) is the actual image coordinate which differs from (X,, Y,) due to lens distortion. However, since the unit for ( X f , Yf), the coordinate used in the computer, is the number of pixels for the discrete image in the frame memory, additional parameters need be specified (and calibrated) that relates the image coordinate in the front image plane to the computer image coordinatk system. The overall transforma- tion from (x,,,, y,, z,) to ( X f , Yf) is depicted in Fig. 2. Step 4 is special to industrial machine vision application where TV cameras (particularly solid-state CCD or CID) are used. The following is the transformation in analytic form for the four steps in Fig. 2. Step I: Rigid body transformation from the object world coordinate system (x,,,, yw, z,,,) to the camera 3D coordinate system (x, Y , z )
TSAI: VERSATILE CAMERA CALIBRATION TECHNIQUE 327 (xw, yw, zw) 3 0 world coordinate I v Rigid body transformation from (xw, y,. zw) to (x, y , z ) Step 1 Parameters to be calibrated: R, T I v (x, y, z ) 3 0 camera coordinate system I 7 Step 2 Perspective projection with pin hole geometry Parameters to be calibrated: f I v (X,, Y,) Ideal undistorted image coordinate I v Step 3 Radial lens distortion Parameters to be calibrated: K , , K* I v (X,, Y,) Distorted image coordinate I w TV scanning, Sampling, computer acquisition Parameter to be calibrated: uncertainty scale factor s, for image ~ ~ ~~ Step 4 X coordinate I v (Xr. Yf) Computer image coordinate in frame medory Fig, 2. Four steps of transformation from 3D world coordinate to computer image coordinate. where R is the 3 X 3 rotation matrix r3 R = r4 rs r6 r9] f-1 r2 [ r7 r8 [;I. T E , (2) (3) and T is the translation vector The parameters to be calibrated are R and T. Note that the rigid body transformation from one Cartesian coordinate system (x,,,, yw, z,) to another (x, y , z ) is unique if the transformation is defined as 3D rotation around the origin (be it defined as three separate rotations-yaw, pitch, and roll- around an axis passing through the origin) followed by the 3D translation. Most of the existing techniques for camera calibration (e.g., see Section I-B) define the transformation as translation followed by rotation. It will be seen later (see Section 11-E) that this order (rotation followed by translation) is crucial to the motivation and development of the new calibration technique. Step 2: Transformation from 3D camera coordinate (x, y , z ) to ideal (undistorted) image coordinate (Xu, Yu) using perspective projection with pinhole camera geometry X X u = f - Z (44 The parameter to be calibrated is the effective focal length f . Step 3: Radial lens distortion is x d + D x = x u (54 (5b) where ( X d , Yd) is the distorted or true image coordinate on the image plane ,and Y d + D y = Y u DX =Xd( K , r2 + ~~r~ + -) Dy= Y d ( ~ I r 2 + ~ 2 r 4 + * e - ) r = q d .
328 IEEE JOURNAL OF ROBOTICS AND AUTOMATION, VOL. RA-3, NO. 4, AUGUST 1987 The parameters to be calibrated are distortion coefficients K ~ . The modeling of lens distortion can be found in [ 181. There are two kinds of distortion: radial and tangential. For each kind of distortion, an infinite series is required. However, my experience shows that for industrial machine vision applica- tion, only radial distortion needs to be considered, and only one term is needed. Any more elaborate modeling not only would not help but also would cause numerical instability. Step 4: Real image coordinate ( X d , Yd) to computer image coordinate coordinate ( X f , Y f ) transformation only the odd or the even field is used, then dy is twice the center-to-center distance between adjacent CCD sensor ele- ments in the Y direction. The situation in X is different. Normally, in TV camera scanning, an analog waveform is generated for each image line by zeroth-order sample and hold. Then it is sampled by the computer into Nfx samples. Therefore, one would easily draw the conclusion that where (Xf, Yf) row and column numbers of the image pixel in (ex, Cy) row and column numbers of the center of (6c) computer frame memory, computer frame memory, dX dY N c x Nfx the direction, center to center distance between adjacent sensor elements in X (scan line) @e) center to center distance between adjacent CCD sensor in (6f) number of sensor elements in the X direc- tion, (6g) number of pixels in a line as sampled by the computer. (6h) Y direction, The parameter to be calibrated is the uncertainty image scale factor s,. To transform between the computer image coordinate (in the forms of rows and columns in frame buffer) and the real image coordinate, obviously the distances between the two adjacent pixels in both the row and column direction in the frame buffer mapped to the real image coordinates need be used. When a vidicon camera is used where both such distances are not known a priori, the multiplane (rather than single plane) calibration technique described in this paper must be used. However, the scale in y is absorbed by the focal length since focal length scales the image in both the x and y directions. Therefore, dy (6b) should be set to one while the computed focal length f will be a product of the actual focal length and the scale factor in y . Also, Ncx and Nfx in (6d) should be set to one since they only apply to CCD cameras. a vidicon type camera is used, the sensor element or pixel mentioned earlier should be regarded as each individual resolution element in the receptor area with the resolution being determined by the sampling rate. If a solid- state CCD or CID discrete array sensor is used and if full resolution is used, since the image is scanned line by line, obviously the distance between adjacent pixels in the y direction is just the same as dy , center to center distance between adjacent CCD sensor elements in Y direction. Therefore, (6b) is the right relationship between Yd and Y. If Note that if Normally, manufacturers of CCD cameras supply informa- tion of dx and dy (defined in (6e) and (6f)) to submicron accuracy. However, an additional uncertainty parameter has to be introduced. This is due to a variety of factors, such as slight hardware timing mismatch between image acquisition hard- ware and camera scanning hardware, or the imprecision of the timing of TV scanning itself. Even a one-percent difference can cause three- to five-pixels error for a full resolution frame. Our experience with the Fairchild CCD 3000 camera shows that the uncertainty is as much as five-percent. Therefore, an unknown parameter sx in (6a) is introduced to accommodate this uncertainty, and to include it in the list of unknown parameters to be calibrated, multiplane calibration technique described in this paper should be used. However, there are a variety of other simple techniques one can use to determine this scale factor in advance (see Lenz and Tsai [ZS]). In this case, the single plane calibration technique suffices. The issue of (ex, Cy) will be discussed later (see Note at end of paper). C. Equations Relating the 3 0 World Coordinates to the 2 0 Computer Image Coordinates By combining the last three steps, the ( X , Y ) computer coordinate is related to (x, y , z), the 3D coordinate of the object point in camera coordinate system, by the following equation: s;'d:X+s;'d:XK1r2= X f - 2 dy'Y+dyYKIr2=f- Y Z (7b) where r=d(s;1d:X)2+(dyY)2. \, Substituting (1) into (7a) and (7b) gives where
TSAI: VERSATILE CAMERA CALIBRATION TECHNIQUE 329 The parameters used in categorized into the following two classes: the transformation in Fig. 2 can be 1) Extrinsic Parameters: The parameters in Step 1 in Fig. 2 for the transformation from 3D object world coordinate system to the camera 3D coordinate system centered at the optical center are called the extrinsic parameters. There are six extrinsic parameters: the Euler angles yaw 8, pitch +, and tilt $ for rotation, the three components for the translation vector T. The rotation matrix R can be expressed as function of 8, 9, and $ as follows: r 1 -sin$cos++cos$sin8cos+ sin$sin++cos$sin8cos+ cos $ cos 8 R = - - according to (4a) and (4b), z changes X , and Y, by the same scale, so that oiP,//oiPd). - Observation IV.‘ The constraint that OjPd is parallel to Po,P for every point, being shown to be independent of the radial distortion coefficients K ] and K ~ , the effective focal length f, and the z component of 3D translation vector T, is actually sufficient to determine the 3D rotation R , X , and Y component of 3D translation from the world coordinate system to the camera coordinate system, and the uncertainty scale factor s, in X component of the image coordinate. sin $ cos 8 cos $cosc$+sin$sinesin+ cosOsin+ -cos$sin++sin$sinOcos+ -sin 8 1 . 1 C O S ~ C O S + (9) 2) Intrinsic Parameters: The parameters in Steps 2-4 in Fig. 2 for the transformation from 3D object coordinate in the camera coordinate system to the computer image coordinate are called the intrinsic parameters. There are six intrinsic effective focal length, or image plane to projec- tive center distance, lens distortion coefficient, uncertainty scale factor for x , due to TV camera scanning and acquisition timing error, computer image coordinate for the origin in the image plane. D. Problem Definition The problem of camera calibration is to compute the camera intrinsic and extrinsic parameters based on a number of points whose object coordinates in the (xw, yw, z,) coordinate system are known and whose image coordinates ( X , Y ) are mea- sured. E. The New Two-Stage Camera Calibration Technique: Motivation The original basis of the new technique is the following four observations. - Observation Z: Since we assume that the distortion is radial, no matter how much the distortion is, the direction of the vector OiPd extending from the origin oi in the image plane to the image point (Xd, Yd) - remains unchanged and is radially aligned with the vector P,,P extending from the optical axis (or, more precisely, the point Po, on the optical axis whose z coordinate is the same as that for the object point ( x , y , z ) ) to the object point (x, y, z). This is illustrated in Fig. 3. See Appendix I for a geometric and an algebraic proof of the radial alignment constraint (RAC). Observation ZI: The effective focal length f also does not - influence the direction of the vector Oipd, since f scales the image coordinate Xd and Yd by the same rate. Observation ZIZ: Once the object world coordinate system is rotated and translated in x and y as in step 1 such that OiPd is parallel to PozP for every point, then translation in will not alter the direction of OjPd (this comes from the fact that, - - Among the four observations, the first three are clearly true, while the last one requires some geometric intuition and “imagination” to establish its validity. It is possible for the author to go into further details on how this intuition was sufficient for a complete proof. reached, but it will not be Rather, the complete proof will be given analytically in the next few sections. In fact, as we will see later, not only is the radial alignment constraint sufficient to determine uniquely the extrinsic parameters (except for T,) and one of the intrinsic parameters (s,), but also the computation entails only the solution of linear equations with five to seven unknowns. This means it can ‘be done fast and done automatically since no initial guess, which is normally required for nonlinear optimization, is needed. F. Calibrating a Camera Using a Monoview Coplanar Set of Points To aid those readers who intend to implement the proposed technique in their applications, the presentation will be algorithm-oriented. The computation procedure for each individual step will first be given, while the derivation and other theoretical issues will follow. Most technical details appear in the Appendices. Fig. 4 illustrates the setup for calibrating a camera using a monoview coplanar set of points. In the actual setup, the plane illustrated in the figure is the top surface of a metal block. The detailed description of the physical setup is given in Section IV-A1. Since the calibration points are on a common plane, the (xw, yw, z,) coordinate system can be chosen such that zw = 0 and the origin is not close to the center of the view or y axis of the camera coordinate system. Since the (xw, y w , z,) is user-defined and the origin is arbitrary, it is no problem setting z,) to be out of the field of view and not the origin of (xw, yw, close to the y axis. The purpose for the latter is to make sure that T, is not exactly zero, so that the presentation of the computation procedure to be described in the following can be made more unified and simpler. (In case it is zero, it is quite straightforward to modify the algorithm but is unnecessary since it can be avoided.) I ) Stage I-Compute 3D Orientation, Position (x and Y):
/* O A X 330 IEEE JOURNAL OF ROBOTICS AND AUTOMATION, VOL. RA-3, NO. 4, AUGUST 1987 Y I I \ L . \ --j-\---)Pd(Xd,Yd) Fig. 3. Illustration of radial alignment constraint. Radial distortion doesnot alter direction of vector from origin to image point, which leads to O,P,// - - O,P,//O,,P. flat surface holding the calibration po i nta, whi ch a r e the cornera of squares. I I I I Fig. 4. Schematic diagram of experimental setup for camera calibration using monoview coplanar set of points. a) Compute the distorted image coordinates ( X d , Yd): Procedure: i) Grab a frame into the computer frame memory. Detect the row and column number of each tion point i. Call it (Xfi, Yfi). calibra- ii) Obtain N,, Nf,, d i , dy according to (6c)-(6h) using supplied information of camera and frame memory by manufacturer. iii) Take (C,, Cy) to be the center pixel of frame memory (see i ) in “Derivation and discussion” below). iv) Compute (Xdi, Ydj) using (6a) and (6b): X d i = S i d i (X, - c,) Ydj=dy(Yfj-Cy) for i = 1, . . . , N, and N is the total number of calibration points. See ii) in “Derivation and discus- sion below concerning s,. Derivation and discussion (also see Note at end of paper): i) Issues concerning image origin: Currently, we do not include the image center (C,, Cy) in the list of camera parameters to be calibrated. We simply take the apparent center of the computer image frame buffer to be the center. The results of the real experiments show that when a full resolution CCD camera is calibrated with the proposed technique, it is so well equipped as to be able to make 3D measurement with one part in 4000 average accuracy. To see the consequence of having a wrongly guessed image center when doing calibration, we intentionally alter the apparent image center by ten pixels. The results of 3D measurement still is about as accurate. We have not yet conducted experiments with the image origin way off the apparent center of the sampled image. While doing the experiments, we did not take the center of the frame memory to be the center of the sampled image or the image origin. It is often the case that image acquisition hardware may have a slight timing error such that the starting of each line may either be too early or too late, causing the RS170 video from CCD camera to be sampled in is the blanking interval between each line of active video). Similar situation may occur in to a much lesser extent. The user should observe the pixel values in the frame memory, and see if there are any blank lines on the border. For example, if there are eight blank lines on the left border and five blank lines on the top border, the image origin should be taken as the center in the frame memory offset (added) by (8, 5), which is the case we encountered in the real experiments described in Section IV. the vertical direction, but usually the front or back porch (porch ii) Issues concerning uncertainty scale factor s,: Unlike the multiplane case, the single plane case does not calibrate the scale factor s,. it is explained in what situation one does not need to calibrate s, and how to get apriori knowledge of s,. See also Step 4 in the to camera coordi- transformation from 3D world coordinate nate in Section 11-B. b) Compute the five unknowns T;lrl, T;’r2, T;lT,, In e) of Section IV-A1, T;lr4, T;’r5. Procedure: For each point i with (xwj, ywj, zwj), (&i, Ydj) as the 3D object coordinate and the corresponding image coordinate (computed in a) above), set up the following linear equation with as unknowns: T;lrl, T;lr2, T; ‘T,, T;Ir4, and 7:;1 [ Ydjxwi Ydiywj Y d i - x d i x w i - x d i ~ w i I T; 1 T, = (10) With N (the number of object points) much larger than five, an overdetermined system of linear equations can be established for the five unknowns T; lrl, T y ‘r2, T i T,, and solved T;lr4, and Ty1r5.
分享到:
收藏