Joke Collection Website - Bulletin headlines - How to locate the robot when grasping? What sensor is used for detection?

How to locate the robot when grasping? What sensor is used for detection?

How to locate the robot when grasping? What kind of sensor is used to detect the action of the manipulator depends on the signal fed back to the processing center of the industrial computer by the encoder on the servo motor, which is compared with the preset parameters, corrected and then output to the servo motor for accurate positioning. In other words, it is determined by pre-programming, not by sensor positioning. Program design can solve the problems of precise direction, speed and execution time of manipulator in three-dimensional space.

According to the position of anchor point and zero point, the robot hand corrects the accumulated error regularly, and the highest order of grasping is to determine the position of the complex by using image technology.

Generally, it is located by photography. The sensor is COMS or CCD.

Through pre-programming and its execution, the feedback signal of the encoder at the shaft end of the servo motor is fed back to the calculation center, and the deviation positioning is finely adjusted.

Robots have learned that there are some differences between machine vision and computer vision in the field of robotics: the purpose of machine vision is to provide robots with information about operating objects. Therefore, the study of machine vision probably has these pieces:

Target recognition: detecting the type of target in the image, which has a great overlap with the research of CV;

Attitude estimation: calculate the position and attitude of the object in the camera coordinate system. For robots, it is necessary to grasp something, not only what it is, but also where it is.

Camera calibration: Because all we have done above is to calculate the coordinates of the object in the camera coordinate system, we also need to determine the relative position and posture of the camera and the robot, so that we can convert the posture of the object into the posture of the robot.

Of course, I mainly focus on machine vision in the field of object grasping; SLAM and other fields will not be discussed for the time being.

Because vision is a very important part of robot perception, there are many studies. Let me introduce some things I know in order from simple to complex:

0. Camera calibration

This is actually a relatively mature field. Because all our object recognition only calculates the attitude of the object in the camera coordinate system, the robot needs to know the attitude of the object in the robot coordinate system when operating the object. Therefore, we need to calibrate the camera's posture first. Not to mention internal reference calibration, refer to Zhang Youzheng's paper or various calibration toolboxes; For external parameter calibration, there are two methods according to the camera installation position:

Eye to hand: the camera is fixedly connected with the polar coordinate system of the robot and does not move with the movement of the mechanical arm.

Hand-held camera: the camera is fixedly connected to the mechanical arm and moves with the movement of the mechanical arm. The two methods to solve this problem are similar. The first is the eye to the hand.

Just fix a chessboard at the end of the mechanical arm and move several postures in the camera's field of vision. Because the camera can calculate the position and posture of the checkerboard relative to the camera coordinate system, and the forward kinematics solution of the robot can calculate the position and posture change from the robot base to the end gripper, while the position and posture of the end gripper and checkerboard are relatively fixed. In this way, we can get a coordinate system ring.

The situation is similar for holding one eye. Just put a chessboard on the ground (fixedly connected with the robot base), then let the mechanical arm hold the camera for several positions, and then a new coordinate ring can be formed.

1. Plane target detection

This is the most common scene on the industrial assembly line at present. At present, the requirements for vision in this field are: fast, accurate and stable. Therefore, the simplest edge extraction+edge matching/shape matching method is generally adopted; Moreover, in order to improve stability, system variables are usually reduced by mainly lighting the light source and using a background with large contrast.

At present, many smart cameras (such as cognex) directly embed these functions; Moreover, the object is generally placed on a plane, and the camera only needs to calculate the three-degree-of-freedom attitude of the object. In addition, this application scenario is generally used to deal with specific workpieces, which is equivalent to only attitude estimation without object recognition. Of course, it is understandable to pursue stability in industry, but with the increasing demand for production automation and the rise of service robots. The complete pose estimation of more complex objects has become a research hotspot of machine vision.

2. Textured objects

The field of robot vision first studies textured objects, such as beverage bottles and fast food boxes with rich textures on their surfaces. Of course, these objects can still use methods similar to edge extraction+template matching. However, in the actual robot operation process, the environment will be more complicated: the illumination condition is uncertain (illumination), the distance between the object and the camera is uncertain (proportion), the angle at which the camera looks at the object is uncertain (rotation, affine), and even it is blocked by other objects (occlusion).

Fortunately, a great god named Lowe proposed a super-strong regional feature point called SIFT (Scale Invariant Feature Transform): Lowe, David G. "Unique image features from scale invariant key points." International Journal of Computer Vision 60.2(2004): 9 1- 1 10. The specific principle can be found in the above papers or various blogs that quote 40000+. Simply put, the feature points extracted by this method are only related to a certain part of the surface texture of the object, and have nothing to do with illumination change, scale change, affine transformation and the whole object. Therefore, using SIFT feature points, we can directly find the same feature points in the camera image as in the database, so as to determine what the object in the camera is (object recognition).

For objects that will not deform, the position of feature points in the object coordinate system is fixed. Therefore, after obtaining some point pairs, we can directly solve the homography matrix between the objects in the camera and the objects in the database. If we use depth camera (such as Kinect) or binocular vision method, we can determine the 3D position of each feature point. Then, by directly solving this PnP problem, we can calculate the attitude of the object in the current camera coordinate system.

The following is the result of a brother who graduated from the laboratory before. Of course, there are still many details to make it really available in the actual operation process, such as: firstly, using point cloud segmentation and Euclidean distance to remove the influence of background, selecting objects with stable characteristics (sometimes SIFT will change), and accelerating matching through Bayesian method. Moreover, besides SIFT, there are many similar feature points, such as SURF and ORB. , and later came out.

3. Objects without texture

Well, problematic objects are easy to solve, so there are still many objects in life or industry that have no texture:

The easiest thing for us to think of is: Is there a feature point that can describe the shape of an object and has invariance similar to SIFT? Unfortunately, as far as I know, there is no such feature point at present. Therefore, in the past, a large class of methods still used template-based matching, but the matching features were specially selected (not just simple features such as edges).

In short, this paper uses the image gradient of color image and the surface normal of depth image as features to match the templates in the database. Because the template in the database is generated after shooting from multiple perspectives of an object, the attitude of the object obtained through this matching can only be regarded as a preliminary estimate, which is not accurate. However, as long as we have the preliminary estimated attitude of this object, we can directly match the object model with the 3D point cloud through ICP algorithm, so as to get the accurate attitude of the object in the camera coordinate system.

Of course, there are many details in the implementation of this algorithm: how to build a template, the representation of color gradient and so on. In addition, this method can't deal with the situation that the object is blocked. (Of course, by lowering the matching threshold, we can deal with partial occlusion, but it will cause false recognition). For partial occlusion, Dr. Zhang of our laboratory improved LineMod last year, but since the paper has not been published, I won't cover it too much first.

4. Deep learning

Because deep learning has achieved very good results in the field of computer vision, as robots, we will naturally try to use DL for robot object recognition.

First of all, for object recognition, this research result that can copy DL can be used by all kinds of CNN. Is there an attempt to integrate deep learning into the field of robotics? What's the difficulty? -In Zhihu's reply, I mentioned that many teams adopted DL as the object recognition algorithm in the Amazon Grab Contest on 20 16. However, in this competition, although many people use DL for object recognition, they still use simple or traditional algorithms for object attitude estimation. It seems that DL has not been widely adopted. As @ Zhou said, generally, the object is segmented on the color image by semantic segmentation neork, and then some segmented point clouds are ICP matched with the 3D model of the object.

Of course, it is also possible to estimate the attitude directly using neural networks.

Its method is probably like this: for an object, take many small pieces of RGB-D data (only care about one patch, and use regional features to deal with occlusion); Each block has a coordinate (relative to the object coordinate system); Then, firstly, self-encoder is used to reduce the dimension of data; Then, Hough forest is trained by using the reduced dimension features.

5. Integration with mission/operational planning

This part is also an interesting research content. Because the purpose of machine vision is to provide information for robots to operate objects, it is not limited to object recognition and positioning in cameras, but often needs to be combined with other modules of robots.

We asked the robot to get a bottle of Sprite from the refrigerator, but this bottle of Sprite was blocked by Melinda. The way we humans do it is this: first, move Melinda away, and then get Sprite. Therefore, for the robot, it needs to visually confirm that Sprite is behind mirinda, and at the same time, it needs to ensure that mirinda can be removed instead of fixed objects such as refrigerator doors. Of course, combining vision with robots will lead to many other interesting new things. Since it's not my own research direction, I won't teach fish to swim.

The positioning of the machine is first determined by engineering design, and the positioning accuracy is unprecedented. The feedback signal from the encoder at the coaxial end of the servo motor is driven by the servo motor for processing, and then transmitted for self-tuning.

The multi-station action of the robot and its positioning in execution are determined by manual programming, which has nothing to do with the sensor for the time being. If you want to improve according to the production process, you must rewrite the program or make modifications and adjustments in the original program.

The robot grasping and positioning are pre-programmed, and the output of the industrial computer drives the servo motor to accurately position, including the feedback signal of the servo motor encoder passing through the motor drive card until the industrial computer further adjusts. If the positioning error detected by the sensor is extremely large, it is impossible to slightly correct the positioning accuracy.

At present, the most common way for robots to grasp is through visual positioning. The CCD/CMOS sensor takes photos in the current field of vision, finds the marking points, calculates the offset coordinates and angles, and then feeds back the data to the robot through the network port or serial port, and the robot makes corresponding corrections.

-Manager Deng of Zhonghe Hangxun Technology Co., Ltd. will answer your questions.

The positioning of robot action is first determined by manual programming, and its position in the air is up and down. Servo motor coaxial end encoder sensor feedback positioning accuracy, servo motor drive card sent to the machining center for processing and output automatic fine adjustment.

Previous article:Model essay on the activity plan of kite festival for kindergarten children
Next article:Knowledge of official reception etiquette