WO2018150086A2 - Methods and apparatuses for determining positions of multi-directional image capture apparatuses - Google Patents
Methods and apparatuses for determining positions of multi-directional image capture apparatuses Download PDFInfo
- Publication number
- WO2018150086A2 WO2018150086A2 PCT/FI2018/050095 FI2018050095W WO2018150086A2 WO 2018150086 A2 WO2018150086 A2 WO 2018150086A2 FI 2018050095 W FI2018050095 W FI 2018050095W WO 2018150086 A2 WO2018150086 A2 WO 2018150086A2
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- images
- generate
- stereo
- image capture
- virtual cameras
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N13/00—Stereoscopic video systems; Multi-view video systems; Details thereof
- H04N13/20—Image signal generators
- H04N13/204—Image signal generators using stereoscopic image cameras
- H04N13/243—Image signal generators using stereoscopic image cameras using three or more 2D image sensors
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N13/00—Stereoscopic video systems; Multi-view video systems; Details thereof
- H04N13/20—Image signal generators
- H04N13/275—Image signal generators from 3D object models, e.g. computer-generated stereoscopic image signals
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T3/00—Geometric image transformations in the plane of the image
- G06T3/40—Scaling of whole images or parts thereof, e.g. expanding or contracting
- G06T3/4038—Image mosaicing, e.g. composing plane images from plane sub-images
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N13/00—Stereoscopic video systems; Multi-view video systems; Details thereof
- H04N13/10—Processing, recording or transmission of stereoscopic or multi-view image signals
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N13/00—Stereoscopic video systems; Multi-view video systems; Details thereof
- H04N13/20—Image signal generators
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N13/00—Stereoscopic video systems; Multi-view video systems; Details thereof
- H04N13/20—Image signal generators
- H04N13/204—Image signal generators using stereoscopic image cameras
- H04N13/207—Image signal generators using stereoscopic image cameras using a single 2D image sensor
- H04N13/232—Image signal generators using stereoscopic image cameras using a single 2D image sensor using fly-eye lenses, e.g. arrangements of circular lenses
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N13/00—Stereoscopic video systems; Multi-view video systems; Details thereof
- H04N13/20—Image signal generators
- H04N13/282—Image signal generators for generating image signals corresponding to three or more geometrical viewpoints, e.g. multi-view systems
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N13/00—Stereoscopic video systems; Multi-view video systems; Details thereof
- H04N13/30—Image reproducers
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N13/00—Stereoscopic video systems; Multi-view video systems; Details thereof
- H04N2013/0074—Stereoscopic image analysis
Definitions
- the present specification relates to methods and apparatuses for determining positions of multi-directional image capture apparatuses.
- Camera pose registration is an important technique used to determine positions and orientations of image capture apparatuses such as cameras.
- this specification describes a method comprising processing a plurality of first images to generate a plurality of stereo-pairs of panoramic images, wherein each first image is captured by a respective camera of a respective one of a plurality of multi-directional image capture apparatuses, and wherein each stereo-pair of panoramic images is generated from first images captured by cameras of a respective one of the plurality of multi-directional image capture apparatuses, performing image re- projection on each panoramic image of the plurality of stereo-pairs of panoramic images, thereby to generate a plurality of re-projected second images which are each associated with a respective virtual camera, processing the plurality of second images to generate respective positions of the virtual cameras associated with the second images, and based on the generated positions of the virtual cameras, determining a position of each of the plurality of multi-directional image capture apparatuses.
- the first images may be fisheye images.
- Processing the plurality of first images to generate the plurality of stereo-pairs of panoramic images may comprise de-warping the first images and stitching the de-warped images to generate the panoramic images.
- the second images may be rectilinear images.
- the processing of the plurality of second images to generate respective positions of the virtual cameras may comprise processing the second images using a structure from motion algorithm to generate the positions of the virtual cameras.
- the panoramic images of each stereo-pair may be offset from each other by a baseline distance.
- the baseline distance to be used may be a predetermined fixed distance.
- the baseline distance to be used may be determined by: minimising a cost function which indicates an error associated with use of each of a plurality of baseline distances, and determining that the baseline distance associated with the lowest error is to be used.
- the processing of the plurality of second images to generate respective positions of the virtual cameras may comprise processing the second images using a structure from motion algorithm to generate the positions of the virtual cameras and the cost function may be a weighted average of: re-projection error from the structure from motion algorithm and variance of calculated baseline distances between stereo-pairs of virtual cameras.
- the method may further comprise determining a pixel to real world distance conversion factor based on the determined positions of the virtual cameras and the baseline distance used.
- the processing of the plurality of second images may generate respective orientations of the virtual cameras, and the first aspect may further comprise: based on the generated orientations of the virtual cameras, determining an orientation of each of the plurality of multi-directional image capture apparatuses.
- this specification describes apparatus configured to perform any method described with reference to the first aspect.
- this specification describes computer-readable instructions which, when executed by computing apparatus, cause the computing apparatus to perform any method described with reference to the first aspect.
- this specification describes apparatus comprising at least one processor, and at least one memory including computer program code, which when executed by the at least one processor, causes the apparatus to: process a plurality of first images to generate a plurality of stereo-pairs of panoramic images, wherein each first image is captured by a respective camera of a respective one of a plurality of multidirectional image capture apparatuses, and wherein each stereo-pair of panoramic images is generated from first images captured by cameras of a respective one of the plurality of multi-directional image capture apparatuses, perform image re-projection on each panoramic image of the plurality of stereo-pairs of panoramic images, thereby to generate a plurality of re-projected second images which are each associated with a respective virtual camera, process the plurality of second images to generate respective positions of the virtual cameras associated with the second images, and based on the generated positions
- the first images may be fisheye images.
- Processing the plurality of first images to generate the plurality of stereo-pairs of panoramic images may comprise de-warping the first images and stitching the de-warped images to generate the panoramic images.
- the second images may be rectilinear images.
- the processing of the plurality of second images to generate respective positions of the virtual cameras may comprise processing the second images using a structure from motion algorithm to generate the positions of the virtual cameras.
- the panoramic images of each stereo-pair may be offset from each other by a baseline distance.
- the baseline distance to be used may be a predetermined fixed distance.
- the baseline distance to be used may be determined by: minimising a cost function which indicates an error associated with use of each of a plurality of baseline distances, and determining that the baseline distance associated with the lowest error is to be used.
- the processing of the plurality of second images to generate respective positions of the virtual cameras may comprise processing the second images using a structure from motion algorithm to generate the positions of the virtual cameras and the cost function may be a weighted average of: re-projection error from the structure from motion algorithm and variance of calculated baseline distances between stereo-pairs of virtual cameras.
- the computer program code when executed by the at least one processor, may cause the apparatus to determine a pixel to real world distance conversion factor based on the determined positions of the virtual cameras and the baseline distance used.
- the processing of the plurality of second images may generate respective orientations of the virtual cameras, and the computer program code, when executed by the at least one processor, may cause the apparatus to: determine an orientation of each of the plurality of multi-directional image capture apparatuses based on the generated orientations of the virtual cameras.
- this specification describes computer-readable medium having computer-readable code stored thereon, the computer readable code, when executed by at least one processor, causes performance of: processing a plurality of first images to generate a plurality of stereo-pairs of panoramic images, wherein each first image is captured by a respective camera of a respective one of a plurality of multi-directional image capture apparatuses, and wherein each stereo-pair of panoramic images is generated from first images captured by cameras of a respective one of the plurality of multi-directional image capture apparatuses, performing image re-projection on each panoramic image of the plurality of stereo-pairs of panoramic images, thereby to generate a plurality of re-projected second images which are each associated with a respective virtual camera, processing the plurality of second images to generate respective positions of the virtual cameras associated with the second images, and based on the generated positions of the virtual cameras, determining a position of each of the plurality of multidirectional image capture apparatuses.
- this specification describes apparatus comprising means for processing a plurality of first images to generate a plurality of stereo-pairs of panoramic images, wherein each first image is captured by a respective camera of a respective one of a plurality of multi-directional image capture apparatuses, and wherein each stereo-pair of panoramic images is generated from first images captured by cameras of a respective one of the plurality of multi-directional image capture apparatuses, means for performing image re-projection on each panoramic image of the plurality of stereo-pairs of panoramic images, thereby to generate a plurality of re-projected second images which are each associated with a respective virtual camera, means for processing the plurality of second images to generate respective positions of the virtual cameras associated with the second images, and means for determining a position of each of the plurality of multi-directional image capture apparatuses based on the generated positions of the virtual cameras.
- the apparatus of the sixth aspect may further comprise means for causing performance of any of the operations described with reference to the method of the first aspect.
- Figure 1 illustrates an example of multiple multi-directional image capture apparatuses in an environment
- Figures 2A and 2B illustrate examples of ways in which images captured by a multi- directional image capture apparatus are processed
- Figures 3A and 3B illustrate the determination of the position and orientation of a multidirectional image capture apparatus relative to a reference coordinate system
- Figure 4 is a flowchart illustrating examples of various operations which may be performed by an image processing apparatus based on a plurality of images captured by a plurality of multi-directional image capture apparatuses;
- Figure 5 is a schematic diagram of an example configuration of an image processing apparatus configured to perform various operations including those described with reference to Figure 4;
- Figure 6 illustrates an example of a computer-readable storage medium with computer readable instructions stored thereon.
- Figure l illustrates a plurality of multi-directional image capture apparatuses 10 located within an environment.
- the multi-directional image capture apparatuses io may, in general, be any apparatus capable of capturing images of a scene 13 from multiple different perspectives simultaneously.
- multi-directional image capture apparatus 10 may be a 360° camera system (also known as an omnidirectional camera system or a spherical camera system).
- 360° camera system also known as an omnidirectional camera system or a spherical camera system.
- multi- directional image capture apparatus 10 does not necessarily have to have full angular coverage of its surroundings and may only cover a smaller field of view.
- image used herein may refer generally to visual content. This may be visual content captured by, or derived from visual content captured by, multi-directional image capture apparatus 10. For example, an image may be a photograph or a single frame of a video.
- each multi-directional image capture apparatus 10 may comprise a plurality of cameras 11.
- the term "camera” used herein may refer to a sub-part of a multi-directional image capture apparatus 10 which performs the capturing of images.
- each of the plurality of cameras 11 of multi-directional image capture apparatus 10 may be facing a different direction to each of the other cameras 11 of the multi-directional image capture apparatus 10.
- each camera 11 of a multidirectional image capture apparatus 10 may have a different field of view, thus allowing the multi-directional image capture apparatus 10 to capture images of a scene 13 from different perspectives simultaneously.
- each multi-directional image capture apparatus 10 may be at a different location to each of the other multi-directional image capture apparatuses 10.
- each of the plurality of multi-directional image capture apparatuses 10 may capture images of the environment (via their cameras 11) from different perspectives simultaneously.
- a plurality of multi-directional image capture apparatuses 10 are arranged to capture images of a particular scene 13 within the environment.
- such information may be used for any of the following: performing 3D reconstruction of the captured environment, performing 3D registration of the multidirectional image capture apparatuses 10 with respect to other sensors such as LiDAR (Light Detection and Ranging) or infrared (IR) depth sensors, audio positioning of audio sources, playback of object-based audio with respect to multi-directional image capture apparatus 10 location, and presenting multi-directional image capture apparatuses positions as 'hotspots' to which a viewer can switch during virtual reality (VR) viewing.
- sensors such as LiDAR (Light Detection and Ranging) or infrared (IR) depth sensors
- audio positioning of audio sources such as a light Detection and Ranging
- playback of object-based audio with respect to multi-directional image capture apparatus 10 location such as presenting multi-directional image capture apparatuses positions as 'hotspots' to which a viewer can switch during virtual reality (VR) viewing.
- VR virtual reality
- GPS Global Positioning System
- GPS only provides position information and does not provide orientation information.
- position information obtained by GPS may not be very accurate and may be susceptible to changes in the quality of the satellite connection.
- One way of determining orientation information is to obtain the orientation information from magnetometers and accelerometers installed in the multi-directional image capture apparatuses 10.
- magnetometers and accelerometers installed in the multi-directional image capture apparatuses 10.
- such instruments may be susceptible to local disturbance (e.g. magnetometers may be disturbed by a local magnetic field), so the accuracy of orientation information obtained in this way is not necessarily very high.
- position and orientation information can be obtained by performing structure from motion (SfM) analysis on images captured by a multi-directional image capture apparatus 10.
- SfM structure from motion
- SfM works by determining point correspondences between images (also known as feature matching) and calculating location and orientation based on the determined point correspondences.
- multi-directional image capture apparatuses 10 when used to capture a scene which lacks distinct features/textures (e.g. a corridor), determination of point correspondences between captured images may be unreliable due to the lack of distinct features/textures in the limited field of view of the images.
- multi- directional image capture apparatuses 10 typically capture fish-eye images, it may not be possible to address this by capturing fish-eye images with increased field of view, as this will lead to increased distortion of the images which may negatively impact point correspondence determination.
- FIG 2A illustrates one of the plurality of multi-directional image capture apparatuses io of Figure l.
- Each of the cameras n of the multi-directional image capture apparatus io may capture a respective first image 21.
- Each first image 21 may be an image of a scene within the field of view 20 of its respective camera n.
- the lens of the camera n may be a fish-eye lens and so the first image 21 may be a fish-eye image (in which the camera field of view is enlarged).
- the method described herein may be applicable for use with lenses and resulting images of other types.
- the camera pose registration method described herein may also be applicable to images captured by a camera with a hyperbolic mirror in which the camera optical centre coincides with the focus of the hyperbola, and images captured by a camera with a parabolic mirror and an orthographic lens in which all reflected rays are parallel to the mirror axis and the orthographic lens is used to provide a focused image.
- the first images 21 may be processed to generate a stereo-pair of panoramic images 22.
- Each panoramic image 22 of the stereo-pair may correspond to a different view of a scene captured by the first images 21 from which the stereo-pair is generated.
- one panoramic image 22 of the stereo-pair may represent a left-eye panoramic image and the other one of the stereo-pair may represent a right-eye panoramic image.
- the stereo-pair of panoramic images 22 may be offset from each other by a baseline distance B.
- the generated panoramas may be referred to as spherical (or part-spherical) panoramas in the sense that they may include image data from a sphere (or part of a sphere) around the multi-directional image capture apparatus 10.
- processing the first images to generate the panoramic images may comprise de-warping the first images 21 and then stitching the de- warped images.
- De-warping the first images 21 may comprise re-projecting each of the first images to convert the first images 21 from a fish eye projection to a spherical projection.
- Fish eye to spherical re-projections are generally known in the art and will not be described here in detail.
- Stitching the de-warped images may, in general, be performed using any suitable image stitching technique. Many image stitching techniques are known in the art and will not be described here in detail. Generally, image stitching involves connecting portions of images together based on point correspondences between images (which may involve feature matching).
- the stereo pair may be processed to generate one or more second images 23. More specifically, image re- projection may be performed on each of the panoramic images 22 to generate one or more re-projected second images 23. For example, if the panoramic image 22 is not rectilinear (e.g. if it is curvilinear), it may be re-projected to generate one or more second images 23 which are rectilinear images. As illustrated in Figure 2A, a corresponding set of second images 23 may be generated for each panoramic image 22 of the stereo pair. The type of re-projection may be dependent on the algorithm used to analyse the second images 23.
- the re-projection may be selected so as to generate rectilinear images.
- the re-projection may generate any type of second image 23, as long as the image type is compatible with the algorithm used to analyse the re-projected images 23.
- Each re-projected second image 23 may be associated with a respective virtual camera.
- a virtual camera is an imaginary camera which does not physically exist, but which corresponds to a camera which would have captured the re-projected second image 23 with which it is associated.
- a virtual camera may be defined by virtual camera parameters which represent the configuration of the virtual camera required in order to have captured to the second image 23.
- a virtual camera can be treated as a real physical camera.
- each virtual camera has, among other virtual camera parameters, a position and orientation which can be determined.
- each panoramic image 22 may be performed by resampling the panoramic image 22 based on a horizontal array of overlapping sub- portions 22-1 of the panoramic image 22.
- the sub-portions 22-1 may be chosen to be evenly spaced so that adjacent sub-portions 22-1 are separated by the same distance (as illustrated by Figure 2B). As such, the viewing directions of adjacent sub-portions 22-1 may differ by the same angular distance.
- a corresponding re-projected second image 23 may be generated for each sub-portion 22-1.
- each re-projected second image 23 may correspond to a respective virtual pinhole camera.
- the virtual pinhole cameras associated with second images 23 generated from one panoramic image 22 may all have the same position, but different orientations (as illustrated by Figure 3A).
- Each second image 23 generated from one of the stereo-pair of panoramic images 22 may form a stereo pair with a second image 23 from the other one of the stereo-pair of panoramic images 22.
- each stereo-pair of second images 23 may correspond to a stereo-pair of virtual cameras.
- Each stereo-pair of virtual cameras may be offset from each other by the baseline distance as described above.
- any number of second images 23 may be generated. Generally speaking, generating more second images 23 may lead to less distortion in each of the second images 23, but may also increase computational complexity. The precise number of second images 23 may be chosen based on the scene/ environment being captured by the multi-directional image capture apparatus 10.
- the methods described with reference to Figures 2A and 2B may be performed for each of a plurality of multi-directional image capture apparatuses 10 which are capturing the same general environment, e.g. the plurality of multi-directional images capture apparatuses 10 as illustrated in Figure 1.
- all of the first images 21 captured by a plurality of multi-directional image capture apparatuses 10 of a particular scene may be processed as described above.
- the first images 21 may correspond to images of a scene at a particular moment in time.
- a first image 21 may correspond to a single video frame of a single camera 11, and all of the first images 21 may be video frames that are captured at the same moment in time.
- FIGs 3A and 3B illustrate the process of determining the position and orientation of a multi-directional image capture apparatus 10.
- each arrow 31, 32 represents the position and orientation of a particular element in a reference coordinate system 30.
- the base of the arrow represents the position and the direction of the arrow represents the orientation.
- each arrow 31 in Figure 3A represents the position and orientation of a virtual camera associated with a respective second image 23
- the arrow 32 in Figure 3B represents the position and orientation of the multidirectional image capture apparatus 10.
- the second images 23 may be processed to generate respective positions of the virtual cameras associated with the second images 23.
- the output of the processing for one multi-directional image capture apparatus 10 is illustrated by Figure 3A.
- the processing may include generating the positions of a set of virtual cameras for each panoramic image 22 of the stereo-pair of panoramic images.
- one set of arrows 33A may correspond to virtual cameras of one of the stereo-pair of panoramic images 22, and the other set of arrows 33B may correspond to virtual cameras of the other one of the stereo-pair of panoramic images.
- the generated positions may be relative to the reference coordinate system 30.
- the processing of the second images may also generate respective orientations of the virtual cameras relative to the reference coordinate system 30.
- all of the virtual cameras of each set of virtual cameras, which correspond to the same panoramic image 22, may have the same position but different orientations.
- the multi-directional image capture apparatuses 10 may be necessary for the multi-directional image capture apparatuses 10 to have at least partially overlapping fields of view with each other (for example, in order to allow point correspondence determination as described below).
- the above described processing may be performed by using a structure from motion (SfM) algorithm to determine the position and orientation of each of the virtual cameras.
- the SfM algorithm may operate by determining point correspondences between various ones of the second images 23 and determining the positions and orientations of the virtual cameras based on the determined point correspondences.
- the determined point correspondences may impose certain geometric constraints on the positions and orientations of the virtual cameras, which can be used to solve a set of quadratic equations to determine the positons and orientations of the virtual cameras relative to the reference coordinate system 30.
- the SfM process may involve any one of or any combination of the following operations: extracting images features, matching image features, estimating camera position, reconstructing 3D points, and performing bundle adjustment.
- the position of the multidirectional image capture apparatus 10 relative to the reference coordinate system 30 may be determined based on the determined positions of the virtual cameras. Similarly, once the orientations of the virtual cameras have been determined, the orientation the multidirectional image capture apparatus 10 relative to the reference coordinate system 30 may be determined based on the determined orientations of the virtual cameras.
- the position of the multi-directional image capture apparatus 10 may be determined by averaging the positions of the two sets 33A, 33B of virtual cameras illustrated by Figure 3A For example, as illustrated, all of the virtual cameras of one set 33A may have the same position as each other and all of the virtual cameras of the other set 33B may also have the same position as each other. As such, the position of the multi-directional image capture apparatus 10 may be determined to be the average of the two respective positions of the two sets 33A, 33B of virtual cameras.
- the orientation of the multi-directional image capture apparatus 10 may be determined by averaging the orientation of the virtual cameras.
- the orientation of the multi-directional image capture apparatus 10 may be determined in the following way.
- the orientation of each virtual camera may be represented by rotation matrix Ri.
- the orientation of the multi-directional image capture apparatus 10 may be represented by rotation matrix Rdev
- the orientation of each virtual camera relative to the multidirectional image capture apparatus 10 may be known, and may be represented by rotation matrix Ridev
- the rotation matrices Ri of the virtual cameras may be used to obtain a rotation matrix for multi-directional image capture apparatus 10 the according to:
- the rotation matrix of a multi-direction image capture apparatus can be determined by multiplying the rotation matrix of a virtual camera (Ri) onto the inverse of the matrix representing the orientation of the virtual camera relative to the orientation of the multi-directional image capture apparatus (Ridev 1 )-
- the rotation matrix of a virtual camera (Ri) onto the inverse of the matrix representing the orientation of the virtual camera relative to the orientation of the multi-directional image capture apparatus (Ridev 1 )-
- twelve rotation matrices are obtained for the orientation of the multi-directional image capture apparatus 10.
- Each of these rotation matrices may then be converted into corresponding Euler angles to obtain a set of Euler angles for the multi-directional image capture apparatus 10.
- the set of Euler angles may then be averaged and converted into a final rotation matrix representing the orientation of the multi-directional image capture apparatus 10.
- the set of Euler angles may then be averaged according to:
- ⁇ represents the averaged Euler angles for a multi-directional image capture apparatus 10 and 0 ⁇ represents the set of Euler angles.
- the averaged Euler angles are determined by calculating the sum of the sines of the set of Euler angles divided by the sum of the cosines of the set of Euler angles, and taking the arctangent of the ratio. 0/may then be converted back into a rotation matrix representing the final determined orientation of multi-directional image capture apparatus 10.
- unit quaternions may be used instead of Euler angles for the abovementioned process.
- the use of unit quaternions to represent orientation is a known mathematical technique and will not be described in detail here. Briefly, quaternions q q 2 , ... qN corresponding to the virtual camera rotation matrices may be determined. Then, the quaternions may be transformed, as necessary, to ensure that they are all on the same side of the 4D hypersphere. Specifically, one representative quaternion q M is selected and the signs of any quaternions qi where the product of qM and qi is less than zero may be inverted.
- all quaternions qi may be summed into an average quaternion q A , and q A may be normalised into a unit quaternion q A '.
- the unit quaternion q A may represent the averaged orientation of the camera and may be converted back to other orientation representations as desired. Using unit quaternions to represent orientation may be more numerically stable than Euler angles.
- the generated positions of the virtual cameras may be in units of pixels. Therefore, in order to enable scale conversions between pixels and a real world distance (e.g. metres), a pixel to real world distance conversion factor may be determined. This may be performed by determining the baseline distance B of a stereo-pair of virtual cameras in both pixels and in a real world distance.
- the baseline distance in pixels may be determined from the determined positions of the virtual cameras in the reference coordinate system 30.
- the baseline distance in a real world distance (e.g. metres) may be known already from being set initially during the generation of the panoramic images 22.
- the pixel to real world distance conversion factor may then be simply calculated by taking the ratio of the two distances.
- This may be further refined by calculating the conversion factor based on each of the stereo-pairs of virtual cameras, determining outliers and inliers (as described in more detail below), and averaging the inliers to obtain a final pixel to real world distance conversion factor.
- the pixel to real world distance conversion factor may be denoted S P i X ei 2 meter in the present specification.
- the inlier and outlier determination may be performed according to:
- S is the set of pixel to real world distance ratios of all stereo-pairs of virtual cameras
- di is a measure of the difference between a pixel to real world distance ratio and the median of all pixel to real world distance ratios
- d a is the median absolute deviation (MAD)
- m is a threshold value below which a determined pixel to real world distance ratio is considered an inlier (for example, m may be set to be 2).
- the MAD may be used as it may be a robust and consistent estimator of inlier errors, which follow a Gaussian distribution.
- a pixel to real world distance ratio may be determined to be an inlier if the difference between its value and the median value divided by the median absolute deviation is less than a threshold value. That is to say, for a pixel to real world distance ratio to be considered an inlier, the difference between its value and the median value must be less than a threshold number of times larger than the median absolute deviation.
- the relative positions of the plurality of multi-directional image capture apparatuses may be determined according to:
- 1 1 represents the relative positions of one of the plurality of multi-directional image capture apparatuses (apparatus/) relative to another one of the plurality of multi-directional image capture apparatuses (apparatus i).
- ddev is the position of apparatus j and ddev is the position of apparatus i.
- S P i X ei 2 meter is the pixel to real world distance conversion factor.
- a vector representing the relative position of one of the plurality of multi-directional image capture apparatuses relative to another one of the plurality of multi-directional image capture apparatuses may be determined by taking the difference between their positions. This may be divided by the pixel-to-real world distance conversion factor depending on the scale desired. As such, the positions of all of the multi-directional image capture apparatuses 10 relative to one another may be determined in the reference coordinate system 30.
- the baseline distance B described above described above may be chosen in two different ways.
- One way is to set a predetermined fixed baseline distance (e.g. based on the average human interpupillary distance) to be used to generate stereo-pairs of panoramic images. This fixed baseline distance may then be used to generate all of the stereo-pairs of panoramic images.
- An alternative way is to treat B as a variable within a range (e.g. a range constrained by the dimensions of the multi-directional image capture apparatus) and to evaluate a cost function for each value of B within the range. For example, this may be performed by minimising a cost function which indicates an error associated with the use of each of a plurality of baseline distances, and determining that the baseline distance associated with the lowest error is to be used.
- a range e.g. a range constrained by the dimensions of the multi-directional image capture apparatus
- the cost function may be defined as the weighted average of the re-projection error from the structure from motion algorithm and the variance of calculated baseline distances between stereo-pairs of virtual cameras.
- the above process may involve generating stereo-pairs of panoramic images for each value of B, generating re-projected second images from the stereo-pairs, and inputting the second images for each value of B into a structure from motion algorithm, as described above.
- the re-projection error from the structure from motion algorithm may be representative of a global registration quality and the variance of calculated baseline distances may be representative of the local registration uncertainty.
- the baseline distance with the lowest cost (and therefore lowest error) may be found, and this may be used as the baseline distance used to determine the position/orientation of the multidirectional image capture apparatus 10.
- Figure 4 is a flowchart showing examples of operations as described herein.
- a plurality of first images 21 which are captured by a plurality of multidirectional image capture apparatuses 10 may be received.
- image data corresponding to the first images 21 may be received at image processing apparatus 50 (see Figure 5).
- the first images 21 may be processed to generate a plurality of stereo- pairs of panoramic images 22.
- the stereo-pairs of panoramic images 22 may be re-projected to generate re-projected second images 23.
- the second images 23 from operation 4.3 may be processed to obtain positions and orientations of virtual cameras.
- the second images 23 may be processed using a structure from motion algorithm.
- a pixel-to-real world distance conversion factor may be determined based on the positions of the virtual cameras determined at operation 4.4 and a baseline distance between stereo-pairs of panoramic images 22.
- positions and orientations of the plurality of multi-directional image capture apparatuses 10 may be determined based on the positions and orientations of the virtual cameras 11 determined at operation 4.4.
- positions of the plurality of multi-directional image capture apparatuses 10 relative to each other may be determined based on the positions of the plurality of multi-directional image capture apparatuses 10 determined at operation 4.7.
- the position of a virtual camera may be the position of the centre of a virtual lens of the virtual camera.
- the position of the multidirectional image capture apparatus 10 may be the centre of the multi-directional image capture apparatus (e.g. if a multi-directional image capture apparatus is spherically shaped, its position may be defined as the geometric centre of the sphere).
- FIG. 5 is a schematic block diagram of an example configuration of image processing (or more simply, computing) apparatus 50, which may be configured to perform any of or any combination of the operations described herein.
- the computing apparatus 50 may comprise memory 51, processing circuitry 52, an input 53, and an output 54.
- the processing circuitry 52 may be of any suitable composition and may include one or more processors 52A of any suitable type or suitable combination of types.
- the processing circuitry 52 may be a programmable processor that interprets computer program instructions and processes data.
- the processing circuitry 52 may include plural programmable processors.
- the processing circuitry 52 may be, for example, programmable hardware with embedded firmware.
- the processing circuitry 52 may be termed processing means.
- the processing circuitry 52 may alternatively or additionally include one or more Application Specific Integrated Circuits (ASICs).
- ASICs Application Specific Integrated Circuits
- processing circuitry 52 may be referred to as computing apparatus.
- the processing circuitry 52 described with reference to Figure 5 may be coupled to the memory 51 (or one or more storage devices) and may be operable to read/write data to/from the memory.
- the memory 51 may store thereon computer readable instructions 512A which, when executed by the processing circuitry 52, may cause any one of or any combination of the operations described herein to be performed.
- the memory 51 may comprise a single memory unit or a plurality of memory units upon which the computer- readable instructions (or code) 512A is stored.
- the memory 51 may comprise both volatile memory 511 and non-volatile memory 512.
- the computer readable instructions 512A may be stored in the non-volatile memory 512 and may be executed by the processing circuitry 52 using the volatile memory 511 for temporary storage of data or data and instructions. Examples of volatile memory include RAM, DRAM, and SDRAM etc. Examples of non-volatile memory include ROM, PROM, EEPROM, flash memory, optical storage, magnetic storage, etc.
- the memories 51 in general may be referred to as non-transitory computer readable memory media.
- the input 53 may be configured to receive image data representing the first images 21 described herein.
- the image data may be received, for instance, from the multi-directional image capture apparatuses 10 themselves or may be received from a storage device.
- the output 54 may be configured to output any of or any combination of the camera pose registration information described herein. As discussed above, the camera pose registration information output by the computing apparatus 50 may be used for various functions as described above with reference to Figure 1.
- Figure 6 illustrates an example of a computer-readable medium 60 with computer- readable instructions (code) stored thereon.
- the computer-readable instructions (code) when executed by a processor, may cause any one of or any combination of the operations described above to be performed.
- Embodiments of the present invention may be implemented in software, hardware, application logic or a combination of software, hardware and application logic.
- the software, application logic and/or hardware may reside on memory, or any computer media.
- the application logic, software or an instruction set is maintained on any one of various conventional computer-readable media.
- a "memory" or “computer-readable medium” may be any media or means that can contain, store, communicate, propagate or transport the instructions for use by or in connection with an instruction execution system, apparatus, or device, such as a computer. Reference to, where relevant, “computer-readable storage medium”, “computer program product”, “tangibly embodied computer program” etc., or a “processor” or “processing circuitry” etc.
- references to computer program, instructions, code etc. should be understood to express software for a programmable processor firmware such as the programmable content of a hardware device as instructions for a processor or configured or configuration settings for a fixed function device, gate array, programmable logic device, etc.
- circuitry refers to all of the following: (a) hardware- only circuit implementations (such as implementations in only analogue and/or digital circuitry) and (b) to combinations of circuits and software (and/or firmware), such as (as applicable): (i) to a combination of processor(s) or (ii) to portions of processor(s)/software (including digital signal processor(s)), software, and memory(ies) that work together to cause an apparatus, such as a server, to perform various functions) and (c) to circuits, such as a microprocessor(s) or a portion of a microprocessor(s), that require software or firmware for operation, even if the software or firmware is not physically present.
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Studio Devices (AREA)
- Image Processing (AREA)
- Stereoscopic And Panoramic Photography (AREA)
- Length Measuring Devices By Optical Means (AREA)
Abstract
This specification describes a method comprising processing a plurality of first images to generate a plurality of stereo-pairs of panoramic images, wherein each first image is captured by a respective camera of a respective one of a plurality of multi-directional image capture apparatuses, and wherein each stereo-pair of panoramic images is generated from first images captured by cameras of a respective one of the plurality of multi-directional image capture apparatuses, performing image re-projection on each panoramic image of the plurality of stereo-pairs of panoramic images, thereby to generate a plurality of re-projected second images which are each associated with a respective virtual camera, processing the plurality of second images to generate respective positions of the virtual cameras associated with the second images,and based on the generated positions of the virtual cameras, determining a position of each of the plurality of multi-directional image capture apparatuses.
Description
Methods and Apparatuses for Determining Positions of Multi- Directional Image Capture Apparatuses
Technical Field
The present specification relates to methods and apparatuses for determining positions of multi-directional image capture apparatuses.
Background
Camera pose registration is an important technique used to determine positions and orientations of image capture apparatuses such as cameras. The recent advent of commercial multi-directional image capture apparatuses, such as 360° camera systems, brings new challenges with regard to the performance of camera pose registration in a reliable, accurate and efficient manner. Summary
According to a first aspect, this specification describes a method comprising processing a plurality of first images to generate a plurality of stereo-pairs of panoramic images, wherein each first image is captured by a respective camera of a respective one of a plurality of multi-directional image capture apparatuses, and wherein each stereo-pair of panoramic images is generated from first images captured by cameras of a respective one of the plurality of multi-directional image capture apparatuses, performing image re- projection on each panoramic image of the plurality of stereo-pairs of panoramic images, thereby to generate a plurality of re-projected second images which are each associated with a respective virtual camera, processing the plurality of second images to generate respective positions of the virtual cameras associated with the second images, and based on the generated positions of the virtual cameras, determining a position of each of the plurality of multi-directional image capture apparatuses.
The first images may be fisheye images.
Processing the plurality of first images to generate the plurality of stereo-pairs of panoramic images may comprise de-warping the first images and stitching the de-warped images to generate the panoramic images. The second images may be rectilinear images.
The processing of the plurality of second images to generate respective positions of the virtual cameras may comprise processing the second images using a structure from motion algorithm to generate the positions of the virtual cameras. The panoramic images of each stereo-pair may be offset from each other by a baseline distance.
The baseline distance to be used may be a predetermined fixed distance. The baseline distance to be used may be determined by: minimising a cost function which indicates an error associated with use of each of a plurality of baseline distances, and determining that the baseline distance associated with the lowest error is to be used.
The processing of the plurality of second images to generate respective positions of the virtual cameras may comprise processing the second images using a structure from motion algorithm to generate the positions of the virtual cameras and the cost function may be a weighted average of: re-projection error from the structure from motion algorithm and variance of calculated baseline distances between stereo-pairs of virtual cameras.
The method may further comprise determining a pixel to real world distance conversion factor based on the determined positions of the virtual cameras and the baseline distance used. The processing of the plurality of second images may generate respective orientations of the virtual cameras, and the first aspect may further comprise: based on the generated orientations of the virtual cameras, determining an orientation of each of the plurality of multi-directional image capture apparatuses. According to a second aspect, this specification describes apparatus configured to perform any method described with reference to the first aspect.
According to a third aspect, this specification describes computer-readable instructions which, when executed by computing apparatus, cause the computing apparatus to perform any method described with reference to the first aspect.
According to a fourth aspect, this specification describes apparatus comprising at least one processor, and at least one memory including computer program code, which when executed by the at least one processor, causes the apparatus to: process a plurality of first images to generate a plurality of stereo-pairs of panoramic images, wherein each first image is captured by a respective camera of a respective one of a plurality of multidirectional image capture apparatuses, and wherein each stereo-pair of panoramic images is generated from first images captured by cameras of a respective one of the plurality of multi-directional image capture apparatuses, perform image re-projection on each panoramic image of the plurality of stereo-pairs of panoramic images, thereby to generate a plurality of re-projected second images which are each associated with a respective virtual camera, process the plurality of second images to generate respective positions of the virtual cameras associated with the second images, and based on the generated positions of the virtual cameras, determine a position of each of the plurality of multidirectional image capture apparatuses.
The first images may be fisheye images.
Processing the plurality of first images to generate the plurality of stereo-pairs of panoramic images may comprise de-warping the first images and stitching the de-warped images to generate the panoramic images.
The second images may be rectilinear images.
The processing of the plurality of second images to generate respective positions of the virtual cameras may comprise processing the second images using a structure from motion algorithm to generate the positions of the virtual cameras.
The panoramic images of each stereo-pair may be offset from each other by a baseline distance.
The baseline distance to be used may be a predetermined fixed distance.
The baseline distance to be used may be determined by: minimising a cost function which indicates an error associated with use of each of a plurality of baseline distances, and determining that the baseline distance associated with the lowest error is to be used.
The processing of the plurality of second images to generate respective positions of the virtual cameras may comprise processing the second images using a structure from motion algorithm to generate the positions of the virtual cameras and the cost function may be a weighted average of: re-projection error from the structure from motion algorithm and variance of calculated baseline distances between stereo-pairs of virtual cameras.
The computer program code, when executed by the at least one processor, may cause the apparatus to determine a pixel to real world distance conversion factor based on the determined positions of the virtual cameras and the baseline distance used.
The processing of the plurality of second images may generate respective orientations of the virtual cameras, and the computer program code, when executed by the at least one processor, may cause the apparatus to: determine an orientation of each of the plurality of multi-directional image capture apparatuses based on the generated orientations of the virtual cameras.
According to a fifth aspect, this specification describes computer-readable medium having computer-readable code stored thereon, the computer readable code, when executed by at least one processor, causes performance of: processing a plurality of first images to generate a plurality of stereo-pairs of panoramic images, wherein each first image is captured by a respective camera of a respective one of a plurality of multi-directional image capture apparatuses, and wherein each stereo-pair of panoramic images is generated from first images captured by cameras of a respective one of the plurality of multi-directional image capture apparatuses, performing image re-projection on each panoramic image of the plurality of stereo-pairs of panoramic images, thereby to generate a plurality of re-projected second images which are each associated with a respective virtual camera, processing the plurality of second images to generate respective positions of the virtual cameras associated with the second images, and based on the generated positions of the virtual cameras, determining a position of each of the plurality of multidirectional image capture apparatuses.
The computer-readable code stored on the medium of the fifth aspect may further cause performance of any of the operations described with reference to the method of the first aspect.
According to a sixth aspect, this specification describes apparatus comprising means for processing a plurality of first images to generate a plurality of stereo-pairs of panoramic images, wherein each first image is captured by a respective camera of a respective one of a plurality of multi-directional image capture apparatuses, and wherein each stereo-pair of panoramic images is generated from first images captured by cameras of a respective one of the plurality of multi-directional image capture apparatuses, means for performing image re-projection on each panoramic image of the plurality of stereo-pairs of panoramic images, thereby to generate a plurality of re-projected second images which are each associated with a respective virtual camera, means for processing the plurality of second images to generate respective positions of the virtual cameras associated with the second images, and means for determining a position of each of the plurality of multi-directional image capture apparatuses based on the generated positions of the virtual cameras.
The apparatus of the sixth aspect may further comprise means for causing performance of any of the operations described with reference to the method of the first aspect.
Brief Description of the Drawings
For a more complete understanding of the methods, apparatuses and computer-readable instructions described herein, reference is now made to the following description taken in connection with the accompanying drawings, in which:
Figure 1 illustrates an example of multiple multi-directional image capture apparatuses in an environment;
Figures 2A and 2B illustrate examples of ways in which images captured by a multi- directional image capture apparatus are processed;
Figures 3A and 3B illustrate the determination of the position and orientation of a multidirectional image capture apparatus relative to a reference coordinate system;
Figure 4 is a flowchart illustrating examples of various operations which may be performed by an image processing apparatus based on a plurality of images captured by a plurality of multi-directional image capture apparatuses;
Figure 5 is a schematic diagram of an example configuration of an image processing apparatus configured to perform various operations including those described with reference to Figure 4;
Figure 6 illustrates an example of a computer-readable storage medium with computer readable instructions stored thereon.
Detailed Description
In the description and drawings, like reference numerals may refer to like elements throughout.
Figure l illustrates a plurality of multi-directional image capture apparatuses 10 located within an environment. The multi-directional image capture apparatuses io may, in general, be any apparatus capable of capturing images of a scene 13 from multiple different perspectives simultaneously. For example, multi-directional image capture apparatus 10 may be a 360° camera system (also known as an omnidirectional camera system or a spherical camera system). However, it will be appreciated that multi- directional image capture apparatus 10 does not necessarily have to have full angular coverage of its surroundings and may only cover a smaller field of view.
The term "image" used herein may refer generally to visual content. This may be visual content captured by, or derived from visual content captured by, multi-directional image capture apparatus 10. For example, an image may be a photograph or a single frame of a video.
As illustrated in Figure 1, each multi-directional image capture apparatus 10 may comprise a plurality of cameras 11. The term "camera" used herein may refer to a sub-part of a multi-directional image capture apparatus 10 which performs the capturing of images. As illustrated, each of the plurality of cameras 11 of multi-directional image capture apparatus 10 may be facing a different direction to each of the other cameras 11 of the multi-directional image capture apparatus 10. As such, each camera 11 of a multidirectional image capture apparatus 10 may have a different field of view, thus allowing the multi-directional image capture apparatus 10 to capture images of a scene 13 from different perspectives simultaneously.
Similarly, as illustrated in Figure 1, each multi-directional image capture apparatus 10 may be at a different location to each of the other multi-directional image capture apparatuses 10. Thus, each of the plurality of multi-directional image capture apparatuses 10 may capture images of the environment (via their cameras 11) from different perspectives simultaneously.
In the example scenario illustrated in Figure 1, a plurality of multi-directional image capture apparatuses 10 are arranged to capture images of a particular scene 13 within the environment. In such circumstances, it may be desirable to perform camera pose registration in order to determine the position and orientation of each of the multi-
directional image capture apparatuses 10. In particular, it may be desirable to determine these positions and orientations relative to a particular reference coordinate system. This allows the overall arrangement of the multi-directional image capture apparatuses 10 relative to each other to be determined, which may be useful for a number of functions. For example, such information may be used for any of the following: performing 3D reconstruction of the captured environment, performing 3D registration of the multidirectional image capture apparatuses 10 with respect to other sensors such as LiDAR (Light Detection and Ranging) or infrared (IR) depth sensors, audio positioning of audio sources, playback of object-based audio with respect to multi-directional image capture apparatus 10 location, and presenting multi-directional image capture apparatuses positions as 'hotspots' to which a viewer can switch during virtual reality (VR) viewing.
One way of determining the positions of multi-directional image capture apparatuses 10 is to use Global Positioning System (GPS) localization. However, GPS only provides position information and does not provide orientation information. In addition, position information obtained by GPS may not be very accurate and may be susceptible to changes in the quality of the satellite connection. One way of determining orientation information is to obtain the orientation information from magnetometers and accelerometers installed in the multi-directional image capture apparatuses 10. However, such instruments may be susceptible to local disturbance (e.g. magnetometers may be disturbed by a local magnetic field), so the accuracy of orientation information obtained in this way is not necessarily very high.
Another way of performing camera pose registration is to use a computer vision method. For example, position and orientation information can be obtained by performing structure from motion (SfM) analysis on images captured by a multi-directional image capture apparatus 10. Broadly speaking, SfM works by determining point correspondences between images (also known as feature matching) and calculating location and orientation based on the determined point correspondences.
However, when multi-directional image capture apparatuses 10 are used to capture a scene which lacks distinct features/textures (e.g. a corridor), determination of point correspondences between captured images may be unreliable due to the lack of distinct features/textures in the limited field of view of the images. In addition, since multi- directional image capture apparatuses 10 typically capture fish-eye images, it may not be possible to address this by capturing fish-eye images with increased field of view, as this
will lead to increased distortion of the images which may negatively impact point correspondence determination.
A computer vision method for performing camera pose registration which may address some or all of the challenges mentioned above will now be described.
Figure 2A illustrates one of the plurality of multi-directional image capture apparatuses io of Figure l. Each of the cameras n of the multi-directional image capture apparatus io may capture a respective first image 21. Each first image 21 may be an image of a scene within the field of view 20 of its respective camera n. In some examples, the lens of the camera n may be a fish-eye lens and so the first image 21 may be a fish-eye image (in which the camera field of view is enlarged). However, the method described herein may be applicable for use with lenses and resulting images of other types. More specifically, the camera pose registration method described herein may also be applicable to images captured by a camera with a hyperbolic mirror in which the camera optical centre coincides with the focus of the hyperbola, and images captured by a camera with a parabolic mirror and an orthographic lens in which all reflected rays are parallel to the mirror axis and the orthographic lens is used to provide a focused image. The first images 21 may be processed to generate a stereo-pair of panoramic images 22. Each panoramic image 22 of the stereo-pair may correspond to a different view of a scene captured by the first images 21 from which the stereo-pair is generated. For example, one panoramic image 22 of the stereo-pair may represent a left-eye panoramic image and the other one of the stereo-pair may represent a right-eye panoramic image. As such, the stereo-pair of panoramic images 22 may be offset from each other by a baseline distance B. By generating panoramic images 22 as an initial step, the effective field of view may be increased, which may allow the methods described herein to better deal with scenes which lack distinct textures (e.g. corridors). The generated panoramas may be referred to as spherical (or part-spherical) panoramas in the sense that they may include image data from a sphere (or part of a sphere) around the multi-directional image capture apparatus 10.
If the first images 21 are fish eye images, processing the first images to generate the panoramic images may comprise de-warping the first images 21 and then stitching the de- warped images. De-warping the first images 21 may comprise re-projecting each of the first images to convert the first images 21 from a fish eye projection to a spherical
projection. Fish eye to spherical re-projections are generally known in the art and will not be described here in detail. Stitching the de-warped images may, in general, be performed using any suitable image stitching technique. Many image stitching techniques are known in the art and will not be described here in detail. Generally, image stitching involves connecting portions of images together based on point correspondences between images (which may involve feature matching).
Following the generation of the stereo-pair of panoramic images 22, the stereo pair may be processed to generate one or more second images 23. More specifically, image re- projection may be performed on each of the panoramic images 22 to generate one or more re-projected second images 23. For example, if the panoramic image 22 is not rectilinear (e.g. if it is curvilinear), it may be re-projected to generate one or more second images 23 which are rectilinear images. As illustrated in Figure 2A, a corresponding set of second images 23 may be generated for each panoramic image 22 of the stereo pair. The type of re-projection may be dependent on the algorithm used to analyse the second images 23. For instance, as is explained below, structure from motion algorithms, which are typically used to analyse rectilinear images, may be used, in which case the re-projection may be selected so as to generate rectilinear images. However, it will be appreciated that, in general, the re-projection may generate any type of second image 23, as long as the image type is compatible with the algorithm used to analyse the re-projected images 23.
Each re-projected second image 23 may be associated with a respective virtual camera. A virtual camera is an imaginary camera which does not physically exist, but which corresponds to a camera which would have captured the re-projected second image 23 with which it is associated. A virtual camera may be defined by virtual camera parameters which represent the configuration of the virtual camera required in order to have captured to the second image 23. As such, for the purposes of the methods and operations described herein, a virtual camera can be treated as a real physical camera. For example, each virtual camera has, among other virtual camera parameters, a position and orientation which can be determined.
As illustrated by Figure 2B, the processing of each panoramic image 22 may be performed by resampling the panoramic image 22 based on a horizontal array of overlapping sub- portions 22-1 of the panoramic image 22. The sub-portions 22-1 may be chosen to be evenly spaced so that adjacent sub-portions 22-1 are separated by the same distance (as illustrated by Figure 2B). As such, the viewing directions of adjacent sub-portions 22-1 may differ by the same angular distance. A corresponding re-projected second image 23
may be generated for each sub-portion 22-1. This may be performed by casting rays following the pinhole camera model (which represents a first order approximation of the mapping from the spherical (3D) panorama to the 2D second images) based on a given field of view (e.g. 120 degrees) of each sub-portion 22-1 from a single viewpoint to the panoramic image 23. As such, each re-projected second image 23 may correspond to a respective virtual pinhole camera. The virtual pinhole cameras associated with second images 23 generated from one panoramic image 22 may all have the same position, but different orientations (as illustrated by Figure 3A).
Each second image 23 generated from one of the stereo-pair of panoramic images 22 may form a stereo pair with a second image 23 from the other one of the stereo-pair of panoramic images 22. As such, each stereo-pair of second images 23 may correspond to a stereo-pair of virtual cameras. Each stereo-pair of virtual cameras may be offset from each other by the baseline distance as described above.
It will be appreciated that, in general, any number of second images 23 may be generated. Generally speaking, generating more second images 23 may lead to less distortion in each of the second images 23, but may also increase computational complexity. The precise number of second images 23 may be chosen based on the scene/ environment being captured by the multi-directional image capture apparatus 10.
The methods described with reference to Figures 2A and 2B may be performed for each of a plurality of multi-directional image capture apparatuses 10 which are capturing the same general environment, e.g. the plurality of multi-directional images capture apparatuses 10 as illustrated in Figure 1. In this way, all of the first images 21 captured by a plurality of multi-directional image capture apparatuses 10 of a particular scene may be processed as described above. It will be appreciated that the first images 21 may correspond to images of a scene at a particular moment in time. For example, if the multi-directional image capture apparatuses 10 are capturing video images, a first image 21 may correspond to a single video frame of a single camera 11, and all of the first images 21 may be video frames that are captured at the same moment in time.
Figures 3A and 3B illustrate the process of determining the position and orientation of a multi-directional image capture apparatus 10. In Figures 3A and 3B, each arrow 31, 32
represents the position and orientation of a particular element in a reference coordinate system 30. The base of the arrow represents the position and the direction of the arrow represents the orientation. More specifically, each arrow 31 in Figure 3A represents the position and orientation of a virtual camera associated with a respective second image 23, and the arrow 32 in Figure 3B represents the position and orientation of the multidirectional image capture apparatus 10.
After generating the second images 23, the second images 23 may be processed to generate respective positions of the virtual cameras associated with the second images 23. The output of the processing for one multi-directional image capture apparatus 10 is illustrated by Figure 3A. The processing may include generating the positions of a set of virtual cameras for each panoramic image 22 of the stereo-pair of panoramic images. As illustrated by Figure 3A, one set of arrows 33A may correspond to virtual cameras of one of the stereo-pair of panoramic images 22, and the other set of arrows 33B may correspond to virtual cameras of the other one of the stereo-pair of panoramic images. The generated positions may be relative to the reference coordinate system 30. The processing of the second images may also generate respective orientations of the virtual cameras relative to the reference coordinate system 30. As mentioned above and illustrated by Figure 3A, all of the virtual cameras of each set of virtual cameras, which correspond to the same panoramic image 22, may have the same position but different orientations.
It will be appreciated that, in order to perform the processing for a plurality of multidirectional image capture apparatuses 10, it may be necessary for the multi-directional image capture apparatuses 10 to have at least partially overlapping fields of view with each other (for example, in order to allow point correspondence determination as described below).
The above described processing may be performed by using a structure from motion (SfM) algorithm to determine the position and orientation of each of the virtual cameras. The SfM algorithm may operate by determining point correspondences between various ones of the second images 23 and determining the positions and orientations of the virtual cameras based on the determined point correspondences. For example, the determined point correspondences may impose certain geometric constraints on the positions and orientations of the virtual cameras, which can be used to solve a set of quadratic equations to determine the positons and orientations of the virtual cameras relative to the reference coordinate system 30. More specifically, in some examples, the SfM process may involve any one of or any combination of the following operations: extracting images features,
matching image features, estimating camera position, reconstructing 3D points, and performing bundle adjustment.
Once the positions of the virtual cameras have been determined, the position of the multidirectional image capture apparatus 10 relative to the reference coordinate system 30 may be determined based on the determined positions of the virtual cameras. Similarly, once the orientations of the virtual cameras have been determined, the orientation the multidirectional image capture apparatus 10 relative to the reference coordinate system 30 may be determined based on the determined orientations of the virtual cameras. The position of the multi-directional image capture apparatus 10 may be determined by averaging the positions of the two sets 33A, 33B of virtual cameras illustrated by Figure 3A For example, as illustrated, all of the virtual cameras of one set 33A may have the same position as each other and all of the virtual cameras of the other set 33B may also have the same position as each other. As such, the position of the multi-directional image capture apparatus 10 may be determined to be the average of the two respective positions of the two sets 33A, 33B of virtual cameras.
Similarly, the orientation of the multi-directional image capture apparatus 10 may be determined by averaging the orientation of the virtual cameras. In more detail, the orientation of the multi-directional image capture apparatus 10 may be determined in the following way.
The orientation of each virtual camera may be represented by rotation matrix Ri. The orientation of the multi-directional image capture apparatus 10 may be represented by rotation matrix Rdev The orientation of each virtual camera relative to the multidirectional image capture apparatus 10 may be known, and may be represented by rotation matrix Ridev Thus, the rotation matrices Ri of the virtual cameras may be used to obtain a rotation matrix for multi-directional image capture apparatus 10 the according to:
Put another way, the rotation matrix of a multi-direction image capture apparatus (Rdev) can be determined by multiplying the rotation matrix of a virtual camera (Ri) onto the inverse of the matrix representing the orientation of the virtual camera relative to the orientation of the multi-directional image capture apparatus (Ridev 1)-
For example, if there are twelve virtual cameras (six from each panoramic image 22 of the stereo-pair of panoramic images) corresponding to the multi-directional image capture apparatus 10 (as illustrated in Figure 3A) then twelve rotation matrices are obtained for the orientation of the multi-directional image capture apparatus 10. Each of these rotation matrices may then be converted into corresponding Euler angles to obtain a set of Euler angles for the multi-directional image capture apparatus 10. The set of Euler angles may then be averaged and converted into a final rotation matrix representing the orientation of the multi-directional image capture apparatus 10. The set of Euler angles may then be averaged according to:
til '-··· arctau -
Y →cosi 9, )
Where θι represents the averaged Euler angles for a multi-directional image capture apparatus 10 and 0,· represents the set of Euler angles. Put another way, the averaged Euler angles are determined by calculating the sum of the sines of the set of Euler angles divided by the sum of the cosines of the set of Euler angles, and taking the arctangent of the ratio. 0/may then be converted back into a rotation matrix representing the final determined orientation of multi-directional image capture apparatus 10.
It will be appreciated that the above formula is for the specific example in which there are nine virtual cameras - the maximum value of i may vary according to the number of virtual cameras generated. For example, if there are twelve virtual cameras as illustrated in Figure 3A, then i may take values from zero to eleven.
In some examples, unit quaternions may be used instead of Euler angles for the abovementioned process. The use of unit quaternions to represent orientation is a known mathematical technique and will not be described in detail here. Briefly, quaternions q q2, ... qN corresponding to the virtual camera rotation matrices may be determined. Then, the quaternions may be transformed, as necessary, to ensure that they are all on the same side of the 4D hypersphere. Specifically, one representative quaternion qM is selected and the signs of any quaternions qi where the product of qM and qi is less than zero may be inverted. Then, all quaternions qi (as 4D vectors) may be summed into an average quaternion qA, and qA may be normalised into a unit quaternion qA'. The unit quaternion qA may represent the averaged orientation of the camera and may be converted back to
other orientation representations as desired. Using unit quaternions to represent orientation may be more numerically stable than Euler angles.
In will be appreciated that the generated positions of the virtual cameras (e.g. from the SfM algorithm) may be in units of pixels. Therefore, in order to enable scale conversions between pixels and a real world distance (e.g. metres), a pixel to real world distance conversion factor may be determined. This may be performed by determining the baseline distance B of a stereo-pair of virtual cameras in both pixels and in a real world distance. The baseline distance in pixels may be determined from the determined positions of the virtual cameras in the reference coordinate system 30. The baseline distance in a real world distance (e.g. metres) may be known already from being set initially during the generation of the panoramic images 22. The pixel to real world distance conversion factor may then be simply calculated by taking the ratio of the two distances. This may be further refined by calculating the conversion factor based on each of the stereo-pairs of virtual cameras, determining outliers and inliers (as described in more detail below), and averaging the inliers to obtain a final pixel to real world distance conversion factor. The pixel to real world distance conversion factor may be denoted SPiXei2meter in the present specification. The inlier and outlier determination may be performed according to:
di ™ 15) - Median.(5)l, ¥5/€ S
da— Median(f d } )
ώ
inliers ~ - - < m. V?.€· N
da
where S is the set of pixel to real world distance ratios of all stereo-pairs of virtual cameras, di is a measure of the difference between a pixel to real world distance ratio and the median of all pixel to real world distance ratios, da is the median absolute deviation (MAD), m is a threshold value below which a determined pixel to real world distance ratio is considered an inlier (for example, m may be set to be 2). The MAD may be used as it may be a robust and consistent estimator of inlier errors, which follow a Gaussian distribution.
It will therefore be understood from the above expressions that a pixel to real world distance ratio may be determined to be an inlier if the difference between its value and the median value divided by the median absolute deviation is less than a threshold value. That is to say, for a pixel to real world distance ratio to be considered an inlier, the difference
between its value and the median value must be less than a threshold number of times larger than the median absolute deviation.
Once final positions for a plurality of multi-directional image capture apparatuses 10 has been determined, the relative positions of the plurality of multi-directional image capture apparatuses may be determined according to:
In the above equation, 1 1 represents the relative positions of one of the plurality of multi-directional image capture apparatuses (apparatus/) relative to another one of the plurality of multi-directional image capture apparatuses (apparatus i). ddev is the position of apparatus j and ddev is the position of apparatus i. SPiXei2meter is the pixel to real world distance conversion factor.
As will be understood from the above expression, a vector representing the relative position of one of the plurality of multi-directional image capture apparatuses relative to another one of the plurality of multi-directional image capture apparatuses may be determined by taking the difference between their positions. This may be divided by the pixel-to-real world distance conversion factor depending on the scale desired. As such, the positions of all of the multi-directional image capture apparatuses 10 relative to one another may be determined in the reference coordinate system 30.
The baseline distance B described above described above may be chosen in two different ways. One way is to set a predetermined fixed baseline distance (e.g. based on the average human interpupillary distance) to be used to generate stereo-pairs of panoramic images. This fixed baseline distance may then be used to generate all of the stereo-pairs of panoramic images.
An alternative way is to treat B as a variable within a range (e.g. a range constrained by the dimensions of the multi-directional image capture apparatus) and to evaluate a cost function for each value of B within the range. For example, this may be performed by minimising a cost function which indicates an error associated with the use of each of a
plurality of baseline distances, and determining that the baseline distance associated with the lowest error is to be used.
The cost function may be defined as the weighted average of the re-projection error from the structure from motion algorithm and the variance of calculated baseline distances between stereo-pairs of virtual cameras. An example of a cost function which may be used is E(B) = w0xR(B)+WixV(B), where E(B) represents the total cost, R(B) represents the re- projection error returned by the SfM algorithm by aligning the generated second images from the stereo-pairs displaced by value B, V(B) represents the variance of calculated baseline distances, and w0 and wt are constant weighting parameters for R(B) and V(B) respectively.
As such, the above process may involve generating stereo-pairs of panoramic images for each value of B, generating re-projected second images from the stereo-pairs, and inputting the second images for each value of B into a structure from motion algorithm, as described above. It will be appreciated that the re-projection error from the structure from motion algorithm may be representative of a global registration quality and the variance of calculated baseline distances may be representative of the local registration uncertainty. It will be appreciated that, by evaluating a cost function as described above, the baseline distance with the lowest cost (and therefore lowest error) may be found, and this may be used as the baseline distance used to determine the position/orientation of the multidirectional image capture apparatus 10. Figure 4 is a flowchart showing examples of operations as described herein.
At operation 4.1, a plurality of first images 21 which are captured by a plurality of multidirectional image capture apparatuses 10 may be received. For example, image data corresponding to the first images 21 may be received at image processing apparatus 50 (see Figure 5).
At operation 4.2, the first images 21 may be processed to generate a plurality of stereo- pairs of panoramic images 22. At operation 4.3, the stereo-pairs of panoramic images 22 may be re-projected to generate re-projected second images 23.
At operation 4.4, the second images 23 from operation 4.3 may be processed to obtain positions and orientations of virtual cameras. For example, the second images 23 may be processed using a structure from motion algorithm. At operation 4.5, a pixel-to-real world distance conversion factor may be determined based on the positions of the virtual cameras determined at operation 4.4 and a baseline distance between stereo-pairs of panoramic images 22.
At operation 4.6, positions and orientations of the plurality of multi-directional image capture apparatuses 10 may be determined based on the positions and orientations of the virtual cameras 11 determined at operation 4.4.
At operation 4.7, positions of the plurality of multi-directional image capture apparatuses 10 relative to each other may be determined based on the positions of the plurality of multi-directional image capture apparatuses 10 determined at operation 4.7.
It will be appreciated that, as described herein, the position of a virtual camera may be the position of the centre of a virtual lens of the virtual camera. The position of the multidirectional image capture apparatus 10 may be the centre of the multi-directional image capture apparatus (e.g. if a multi-directional image capture apparatus is spherically shaped, its position may be defined as the geometric centre of the sphere).
Figure 5 is a schematic block diagram of an example configuration of image processing (or more simply, computing) apparatus 50, which may be configured to perform any of or any combination of the operations described herein. The computing apparatus 50 may comprise memory 51, processing circuitry 52, an input 53, and an output 54.
The processing circuitry 52 may be of any suitable composition and may include one or more processors 52A of any suitable type or suitable combination of types. For example, the processing circuitry 52 may be a programmable processor that interprets computer program instructions and processes data. The processing circuitry 52 may include plural programmable processors. Alternatively, the processing circuitry 52 may be, for example, programmable hardware with embedded firmware. The processing circuitry 52 may be termed processing means. The processing circuitry 52 may alternatively or additionally include one or more Application Specific Integrated Circuits (ASICs). In some instances, processing circuitry 52 may be referred to as computing apparatus.
The processing circuitry 52 described with reference to Figure 5 may be coupled to the memory 51 (or one or more storage devices) and may be operable to read/write data to/from the memory. The memory 51 may store thereon computer readable instructions 512A which, when executed by the processing circuitry 52, may cause any one of or any combination of the operations described herein to be performed. The memory 51 may comprise a single memory unit or a plurality of memory units upon which the computer- readable instructions (or code) 512A is stored. For example, the memory 51 may comprise both volatile memory 511 and non-volatile memory 512. For example, the computer readable instructions 512A may be stored in the non-volatile memory 512 and may be executed by the processing circuitry 52 using the volatile memory 511 for temporary storage of data or data and instructions. Examples of volatile memory include RAM, DRAM, and SDRAM etc. Examples of non-volatile memory include ROM, PROM, EEPROM, flash memory, optical storage, magnetic storage, etc. The memories 51 in general may be referred to as non-transitory computer readable memory media.
The input 53 may be configured to receive image data representing the first images 21 described herein. The image data may be received, for instance, from the multi-directional image capture apparatuses 10 themselves or may be received from a storage device. The output 54 may be configured to output any of or any combination of the camera pose registration information described herein. As discussed above, the camera pose registration information output by the computing apparatus 50 may be used for various functions as described above with reference to Figure 1.
Figure 6 illustrates an example of a computer-readable medium 60 with computer- readable instructions (code) stored thereon. The computer-readable instructions (code), when executed by a processor, may cause any one of or any combination of the operations described above to be performed.
Embodiments of the present invention may be implemented in software, hardware, application logic or a combination of software, hardware and application logic. The software, application logic and/or hardware may reside on memory, or any computer media. In an example embodiment, the application logic, software or an instruction set is maintained on any one of various conventional computer-readable media. In the context of this document, a "memory" or "computer-readable medium" may be any media or means that can contain, store, communicate, propagate or transport the instructions for use by or in connection with an instruction execution system, apparatus, or device, such as a computer.
Reference to, where relevant, "computer-readable storage medium", "computer program product", "tangibly embodied computer program" etc., or a "processor" or "processing circuitry" etc. should be understood to encompass not only computers having differing architectures such as single/multi-processor architectures and sequencers/parallel architectures, but also specialised circuits such as field programmable gate arrays FPGA, application specify circuits ASIC, signal processing devices and other devices. References to computer program, instructions, code etc. should be understood to express software for a programmable processor firmware such as the programmable content of a hardware device as instructions for a processor or configured or configuration settings for a fixed function device, gate array, programmable logic device, etc.
As used in this application, the term "circuitry" refers to all of the following: (a) hardware- only circuit implementations (such as implementations in only analogue and/or digital circuitry) and (b) to combinations of circuits and software (and/or firmware), such as (as applicable): (i) to a combination of processor(s) or (ii) to portions of processor(s)/software (including digital signal processor(s)), software, and memory(ies) that work together to cause an apparatus, such as a server, to perform various functions) and (c) to circuits, such as a microprocessor(s) or a portion of a microprocessor(s), that require software or firmware for operation, even if the software or firmware is not physically present.
If desired, the different functions discussed herein may be performed in a different order and/or concurrently with each other. Furthermore, if desired, one or more of the above- described functions may be optional or may be combined. Similarly, it will also be appreciated that the flow diagram of Figure 4 is an example only and that various operations depicted therein may be omitted, reordered and/or combined. For example, it will be appreciated that operation S4.5 as illustrated in Figure 4 may be omitted.
Although various aspects of the invention are set out in the independent claims, other aspects of the invention comprise other combinations of features from the described embodiments and/or the dependent claims with the features of the independent claims, and not solely the combinations explicitly set out in the claims.
It is also noted herein that while the above describes various examples, these descriptions should not be viewed in a limiting sense. Rather, there are several variations and modifications which may be made without departing from the scope of the present invention as defined in the appended claims.
Claims
1. A method comprising:
processing a plurality of first images to generate a plurality of stereo-pairs of panoramic images, wherein each first image is captured by a respective camera of a respective one of a plurality of multi-directional image capture apparatuses, and wherein each stereo-pair of panoramic images is generated from first images captured by cameras of a respective one of the plurality of multi-directional image capture apparatuses;
performing image re-projection on each panoramic image of the plurality of stereo- pairs of panoramic images, thereby to generate a plurality of re-projected second images which are each associated with a respective virtual camera;
processing the plurality of second images to generate respective positions of the virtual cameras associated with the second images; and
based on the generated positions of the virtual cameras, determining a position of each of the plurality of multi-directional image capture apparatuses.
2. The method of claim l, wherein the first images are fisheye images.
3. The method of claim 2, wherein processing the plurality of first images to generate the plurality of stereo-pairs of panoramic images comprises:
de-warping the first images; and
stitching the de- warped images to generate the panoramic images.
4. The method of any one of the preceding claims, wherein the second images are rectilinear images.
5. The method of any one of the preceding claims, wherein the processing of the plurality of second images to generate respective positions of the virtual cameras comprises processing the second images using a structure from motion algorithm to generate the positions of the virtual cameras.
6. The method of any one of the preceding claims, wherein the panoramic images of each stereo-pair are offset from each other by a baseline distance.
7. The method of claim 6, wherein the baseline distance to be used is a
predetermined fixed distance.
8. The method of claim 6, wherein the baseline distance to be used is determined by: minimising a cost function which indicates an error associated with use of each of a plurality of baseline distances; and
determining that the baseline distance associated with the lowest error is to be used.
9. The method of claim 8 wherein the processing of the plurality of second images to generate respective positions of the virtual cameras comprises processing the second images using a structure from motion algorithm to generate the positions of the virtual cameras and wherein the cost function is a weighted average of:
re-projection error from the structure from motion algorithm; and
variance of calculated baseline distances between stereo-pairs of virtual cameras.
10. The method of any one of claims 6 to 9, further comprising:
determining a pixel to real world distance conversion factor based on the determined positions of the virtual cameras and the baseline distance used.
11. The method of any one of the preceding claims, wherein the processing of the plurality of second images generates respective orientations of the virtual cameras, and the method further comprises:
based on the generated orientations of the virtual cameras, determining an orientation of each of the plurality of multi-directional image capture apparatuses.
12. Apparatus configured to perform a method according to any one of claims 1 to 11.
13. Computer-readable instructions which, when executed by computing apparatus, cause the computing apparatus to perform a method according to any one of claims 1 to 11.
14. Apparatus comprising:
at least one processor; and
at least one memory including computer program code, which when executed by the at least one processor, causes the apparatus to:
process a plurality of first images to generate a plurality of stereo-pairs of panoramic images, wherein each first image is captured by a respective camera of a respective one of a plurality of multi-directional image capture apparatuses, and wherein each stereo-pair of panoramic images is generated from first images captured by cameras of a respective one of the plurality of multi-directional image capture apparatuses;
perform image re-projection on each panoramic image of the plurality of stereo- pairs of panoramic images, thereby to generate a plurality of re-projected second images which are each associated with a respective virtual camera;
process the plurality of second images to generate respective positions of the virtual cameras associated with the second images; and
based on the generated positions of the virtual cameras, determine a position of each of the plurality of multi-directional image capture apparatuses.
15. The apparatus of claim 14, wherein the first images are fisheye images.
16. The apparatus of claim 15, wherein processing the plurality of first images to generate the plurality of stereo-pairs of panoramic images comprises:
de-warping the first images; and
stitching the de- warped images to generate the panoramic images.
17. The apparatus of any one of claims 14 to 16, wherein the second images are rectilinear images.
18. The apparatus of any one of claims 14 to 17, wherein the processing of the plurality of second images to generate respective positions of the virtual cameras comprises processing the second images using a structure from motion algorithm to generate the positions of the virtual cameras.
19. The apparatus of any one of claims 14 to 18, wherein the panoramic images of each stereo-pair are offset from each other by a baseline distance.
20. The apparatus of claim 19, wherein the baseline distance to be used is a
predetermined fixed distance.
21. The apparatus of claim 19, wherein the baseline distance to be used is determined by:
minimising a cost function which indicates an error associated with use of each of a plurality of baseline distances; and
determining that the baseline distance associated with the lowest error is to be used.
22. The apparatus of claim 21, wherein the processing of the plurality of second images to generate respective positions of the virtual cameras comprises processing the second images using a structure from motion algorithm to generate the positions of the virtual cameras and wherein the cost function is a weighted average of:
re-projection error from the structure from motion algorithm; and
variance of calculated baseline distances between stereo-pairs of virtual cameras.
23. The apparatus of any one of claims 19 to 22, wherein the computer program code, when executed by the at least one processor, causes the apparatus to:
determine a pixel to real world distance conversion factor based on the determined positions of the virtual cameras and the baseline distance used.
24. The apparatus of any one of claims 14 to 23, wherein the processing of the plurality of second images generates respective orientations of the virtual cameras, and the computer program code, when executed by the at least one processor, causes the apparatus to:
determine an orientation of each of the plurality of multi-directional image capture apparatuses based on the generated orientations of the virtual cameras.
25. A computer-readable medium having computer-readable code stored thereon, the computer readable code, when executed by at least one processor, causes performance of: processing a plurality of first images to generate a plurality of stereo-pairs of panoramic images, wherein each first image is captured by a respective camera of a respective one of a plurality of multi-directional image capture apparatuses, and wherein each stereo-pair of panoramic images is generated from first images captured by cameras of a respective one of the plurality of multi-directional image capture apparatuses;
performing image re-projection on each panoramic image of the plurality of stereo- pairs of panoramic images, thereby to generate a plurality of re-projected second images which are each associated with a respective virtual camera;
processing the plurality of second images to generate respective positions of the virtual cameras associated with the second images; and
based on the generated positions of the virtual cameras, determining a position of each of the plurality of multi-directional image capture apparatuses.
Apparatus comprising:
means for processing a plurality of first images to generate a plurality of stereo- of panoramic images, wherein each first image is captured by a respective camera
a respective one of a plurality of multi-directional image capture apparatuses, and wherein each stereo-pair of panoramic images is generated from first images captured by cameras of a respective one of the plurality of multi-directional image capture apparatuses;
means for performing image re-projection on each panoramic image of the plurality of stereo-pairs of panoramic images, thereby to generate a plurality of re- projected second images which are each associated with a respective virtual camera;
means for processing the plurality of second images to generate respective positions of the virtual cameras associated with the second images; and
means for determining a position of each of the plurality of multi-directional image capture apparatuses based on the generated positions of the virtual cameras.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
GB1702680.8A GB2560301A (en) | 2017-02-20 | 2017-02-20 | Methods and apparatuses for determining positions of multi-directional image capture apparatuses |
GB1702680.8 | 2017-02-20 |
Publications (2)
Publication Number | Publication Date |
---|---|
WO2018150086A2 true WO2018150086A2 (en) | 2018-08-23 |
WO2018150086A3 WO2018150086A3 (en) | 2018-10-25 |
Family
ID=58486857
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/FI2018/050095 Ceased WO2018150086A2 (en) | 2017-02-20 | 2018-02-12 | Methods and apparatuses for determining positions of multi-directional image capture apparatuses |
Country Status (2)
Country | Link |
---|---|
GB (1) | GB2560301A (en) |
WO (1) | WO2018150086A2 (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114760458A (en) * | 2022-04-28 | 2022-07-15 | 中南大学 | Method for synchronizing tracks of virtual camera and real camera of high-reality augmented reality studio |
WO2023098737A1 (en) * | 2021-11-30 | 2023-06-08 | 中兴通讯股份有限公司 | Three-dimensional reconstruction method, electronic device, and computer-readable storage medium |
Family Cites Families (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2003524927A (en) * | 1998-09-17 | 2003-08-19 | イッサム リサーチ ディベロップメント カンパニー オブ ザ ヘブリュー ユニバーシティ オブ エルサレム | System and method for generating and displaying panoramic images and videos |
JP5481337B2 (en) * | 2010-09-24 | 2014-04-23 | 株式会社東芝 | Image processing device |
JP6126820B2 (en) * | 2012-11-09 | 2017-05-10 | 任天堂株式会社 | Image generation method, image display method, image generation program, image generation system, and image display apparatus |
US9892493B2 (en) * | 2014-04-21 | 2018-02-13 | Texas Instruments Incorporated | Method, apparatus and system for performing geometric calibration for surround view camera solution |
US11205305B2 (en) * | 2014-09-22 | 2021-12-21 | Samsung Electronics Company, Ltd. | Presentation of three-dimensional video |
JP2016171463A (en) * | 2015-03-12 | 2016-09-23 | キヤノン株式会社 | Image processing system, image processing method, and program |
-
2017
- 2017-02-20 GB GB1702680.8A patent/GB2560301A/en not_active Withdrawn
-
2018
- 2018-02-12 WO PCT/FI2018/050095 patent/WO2018150086A2/en not_active Ceased
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2023098737A1 (en) * | 2021-11-30 | 2023-06-08 | 中兴通讯股份有限公司 | Three-dimensional reconstruction method, electronic device, and computer-readable storage medium |
CN114760458A (en) * | 2022-04-28 | 2022-07-15 | 中南大学 | Method for synchronizing tracks of virtual camera and real camera of high-reality augmented reality studio |
Also Published As
Publication number | Publication date |
---|---|
WO2018150086A3 (en) | 2018-10-25 |
GB201702680D0 (en) | 2017-04-05 |
GB2560301A (en) | 2018-09-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
EP3428875A1 (en) | Methods and apparatuses for panoramic image processing | |
US11748906B2 (en) | Gaze point calculation method, apparatus and device | |
US10334168B2 (en) | Threshold determination in a RANSAC algorithm | |
WO2021139176A1 (en) | Pedestrian trajectory tracking method and apparatus based on binocular camera calibration, computer device, and storage medium | |
US11216979B2 (en) | Dual model for fisheye lens distortion and an algorithm for calibrating model parameters | |
US10565803B2 (en) | Methods and apparatuses for determining positions of multi-directional image capture apparatuses | |
CN114187344B (en) | Map construction method, device and equipment | |
GB2567245A (en) | Methods and apparatuses for depth rectification processing | |
Brückner et al. | Intrinsic and extrinsic active self-calibration of multi-camera systems | |
US8019180B2 (en) | Constructing arbitrary-plane and multi-arbitrary-plane mosaic composite images from a multi-imager | |
CN111402136A (en) | Panorama generation method and device, computer readable storage medium and electronic equipment | |
WO2018100230A1 (en) | Method and apparatuses for determining positions of multi-directional image capture apparatuses | |
JP2016114445A (en) | Three-dimensional position calculation device, program for the same, and cg composition apparatus | |
WO2018150086A2 (en) | Methods and apparatuses for determining positions of multi-directional image capture apparatuses | |
Fan et al. | Light fields stitching for windowed-6DoF VR content | |
Ha et al. | Embedded panoramic mosaic system using auto-shot interface | |
JP2005275789A (en) | 3D structure extraction method | |
JP2005063012A (en) | Omnidirectional camera motion and three-dimensional information restoration method, apparatus and program thereof, and recording medium recording the same | |
Brückner et al. | Active self-calibration of multi-camera systems | |
JP3452188B2 (en) | Tracking method of feature points in 2D video | |
Dunn et al. | A geometric solver for calibrated stereo egomotion | |
Peng et al. | A low-cost implementation of a 360° vision distributed aperture system | |
Kim et al. | Environment modelling using spherical stereo imaging | |
CN116489488B (en) | Method and device for determining delay between camera and catcher and electronic equipment | |
JP6071142B2 (en) | Image converter |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 18753825 Country of ref document: EP Kind code of ref document: A2 |